Drug-induced Adverse Events

Synthesis of human parainfluenza virus 2 nucleocapsid protein in yeast as nucleocapsid-like particles and investigation of its antigenic structure.
Synthesis of human parainfluenza virus 2 nucleocapsid protein in yeast as nucleocapsid-like particles and investigation of its antigenic structure.
Appl Microbiol Biotechnol. 2016 May;100(10):4523-34
Authors: Bulavaitė A, Lasickienė R, Vaitiekaitė A, Sasnauskas K, Žvirblienė A
Abstract
The aim of this study was to investigate the suitability of yeast Saccharomyces cerevisiae expression system for the production of human parainfluenza virus type 2 (HPIV2) nucleocapsid (N) protein in the form of nucleocapsid-like particles (NLPs) and to characterize its antigenic structure. The gene encoding HPIV2 N amino acid (aa) sequence RefSeq NP_598401.1 was cloned into the galactose-inducible S. cerevisiae expression vector and its high-level expression was achieved. However, this recombinant HPIV2 N protein did not form NLPs. The PCR mutagenesis was carried out to change the encoded aa residues to the ones conserved across HPIV2 isolates. Synthesis of the modified proteins in yeast demonstrated that the single aa substitution NP_598401.1:p.D331V was sufficient for the self-assembly of NLPs. The significance of certain aa residues in this position was confirmed by analysing HPIV2 N protein structure models. To characterize the antigenic structure of NLP-forming HPIV2 N protein, a panel of monoclonal antibodies (MAbs) was generated. The majority of the MAbs raised against the recombinant NLPs recognized HPIV2-infected cells suggesting the antigenic similarity between the recombinant and virus-derived HPIV2 N protein. Fine epitope mapping revealed the C-terminal part (aa 386-504) as the main antigenic region of the HPIV2 N protein. In conclusion, the current study provides new data on the impact of HPIV2 N protein sequence variants on the NLP self-assembly and demonstrates an efficient production of recombinant HPIV2 N protein in the form of NLPs.
PMID: 26821928 [PubMed - indexed for MEDLINE]
Social media for arthritis-related comparative effectiveness and safety research and the impact of direct-to-consumer advertising.
Social media for arthritis-related comparative effectiveness and safety research and the impact of direct-to-consumer advertising.
Arthritis Res Ther. 2017 Mar 07;19(1):48
Authors: Curtis JR, Chen L, Higginbotham P, Nowell WB, Gal-Levy R, Willig J, Safford M, Coe J, O'Hara K, Sa'adon R
Abstract
BACKGROUND: Social media may complement traditional data sources to answer comparative effectiveness/safety questions after medication licensure.
METHODS: The Treato platform was used to analyze all publicly available social media data including Facebook, blogs, and discussion boards for posts mentioning inflammatory arthritis (e.g. rheumatoid, psoriatic). Safety events were self-reported by patients and mapped to medical ontologies, resolving synonyms. Disease and symptom-related treatment indications were manually redacted. The units of analysis were unique terms in posts. Pre-specified conditions (e.g. herpes zoster (HZ)) were selected based upon safety signals from clinical trials and reported as pairwise odds ratios (ORs); drugs were compared with Fisher's exact test. Empirically identified events were analyzed using disproportionality analysis and reported as relative reporting ratios (RRRs). The accuracy of a natural language processing (NLP) classifier to identify cases of shingles associated with arthritis medications was assessed.
RESULTS: As of October 2015, there were 785,656 arthritis-related posts. Posts were predominantly US posts (75%) from patient authors (87%) under 40 years of age (61%). For HZ posts (n = 1815), ORs were significantly increased with tofacitinib versus other rheumatoid arthritis therapies. ORs for mentions of perforated bowel (n = 13) were higher with tocilizumab versus other therapies. RRRs associated with tofacitinib were highest in conditions related to baldness and hair regrowth, infections and cancer. The NLP classifier had a positive predictive value of 91% to identify HZ. There was a threefold increase in posts following television direct-to-consumer advertisement (p = 0.04); posts expressing medication safety concerns were significantly more frequent than favorable posts.
CONCLUSION: Social media is a challenging yet promising data source that may complement traditional approaches for comparative effectiveness research for new medications.
PMID: 28270190 [PubMed - in process]
Controlling testing volume for respiratory viruses using machine learning and text mining.
Controlling testing volume for respiratory viruses using machine learning and text mining.
AMIA Annu Symp Proc. 2016;2016:1910-1919
Authors: Mai MV, Krauthammer M
Abstract
Viral testing for pediatric inpatients with respiratory symptoms is common, with considerable associated charges. In an attempt to reduce testing volumes, we studied whether data available at the time of admission could aid in identifying children with low likelihood of having a particular viral origin of their symptoms, and thus safely forgo broad viral testing. We collected clinical data for 1,685 pediatric inpatients receiving respiratory virus testing from 2010-2012. Machine-learning on the data allowed us to construct pre-test models predicting whether a patient would test positive for a particular virus. Text mining improved the predictions for one viral test. Cost-sensitive models optimized for test sensitivity showed reasonable test specificities and an ability to reduce test volume by up to 46% for single viral tests. We conclude that diverse forms of data in the electronic medical record can be used productively to build models that help physicians reduce testing volumes.
PMID: 28269950 [PubMed - in process]
Ensembles of NLP Tools for Data Element Extraction from Clinical Notes.
Ensembles of NLP Tools for Data Element Extraction from Clinical Notes.
AMIA Annu Symp Proc. 2016;2016:1880-1889
Authors: Kuo TT, Rao P, Maehara C, Doan S, Chaparro JD, Day ME, Farcas C, Ohno-Machado L, Hsu CN
Abstract
Natural Language Processing (NLP) is essential for concept extraction from narrative text in electronic health records (EHR). To extract numerous and diverse concepts, such as data elements (i.e., important concepts related to a certain medical condition), a plausible solution is to combine various NLP tools into an ensemble to improve extraction performance. However, it is unclear to what extent ensembles of popular NLP tools improve the extraction of numerous and diverse concepts. Therefore, we built an NLP ensemble pipeline to synergize the strength of popular NLP tools using seven ensemble methods, and to quantify the improvement in performance achieved by ensembles in the extraction of data elements for three very different cohorts. Evaluation results show that the pipeline can improve the performance of NLP tools, but there is high variability depending on the cohort.
PMID: 28269947 [PubMed - in process]
Investigating Longitudinal Tobacco Use Information from Social History and Clinical Notes in the Electronic Health Record.
Investigating Longitudinal Tobacco Use Information from Social History and Clinical Notes in the Electronic Health Record.
AMIA Annu Symp Proc. 2016;2016:1209-1218
Authors: Wang Y, Chen ES, Pakhomov S, Lindemann E, Melton GB
Abstract
The electronic health record (EHR) provides an opportunity for improved use of clinical documentation including leveraging tobacco use information by clinicians and researchers. In this study, we investigated the content, consistency, and completeness of tobacco use data from structured and unstructured sources in the EHR. A natural language process (NLP) pipeline was utilized to extract details about tobacco use from clinical notes and free-text tobacco use comments within the social history module of an EHR system. We analyzed the consistency of tobacco use information within clinical notes, comments, and available structured fields for tobacco use. Our results indicate that structured fields for tobacco use alone may not be able to provide complete tobacco use information. While there was better consistency for some elements (e.g., status and type), inconsistencies were found particularly for temporal information. Further work is needed to improve tobacco use information integration from different parts of the EHR.
PMID: 28269918 [PubMed - in process]
Towards Comprehensive Clinical Abbreviation Disambiguation Using Machine-Labeled Training Data.
Towards Comprehensive Clinical Abbreviation Disambiguation Using Machine-Labeled Training Data.
AMIA Annu Symp Proc. 2016;2016:560-569
Authors: Finley GP, Pakhomov SV, McEwan R, Melton GB
Abstract
Abbreviation disambiguation in clinical texts is a problem handled well by fully supervised machine learning methods. Acquiring training data, however, is expensive and would be impractical for large numbers of abbreviations in specialized corpora. An alternative is a semi-supervised approach, in which training data are automatically generated by substituting long forms in natural text with their corresponding abbreviations. Most prior implementations of this method either focus on very few abbreviations or do not test on real-world data. We present a realistic use case by testing several semi-supervised classification algorithms on a large hand-annotated medical record of occurrences of 74 ambiguous abbreviations. Despite notable differences between training and test corpora, classifiers achieve up to 90% accuracy. Our tests demonstrate that semi-supervised abbreviation disambiguation is a viable and extensible option for medical NLP systems.
PMID: 28269852 [PubMed - in process]
Automated Detection of Privacy Sensitive Conditions in C-CDAs: Security Labeling Services at the Department of Veterans Affairs.
Automated Detection of Privacy Sensitive Conditions in C-CDAs: Security Labeling Services at the Department of Veterans Affairs.
AMIA Annu Symp Proc. 2016;2016:332-341
Authors: Bouhaddou O, Davis M, Donahue M, Mallia A, Griffin S, Teal J, Nebeker J
Abstract
Care coordination across healthcare organizations depends upon health information exchange. Various policies and laws govern permissible exchange, particularly when the information includes privacy sensitive conditions. The Department of Veterans Affairs (VA) privacy policy has required either blanket consent or manual sensitivity review prior to exchanging any health information. The VA experience has been an expensive, administratively demanding burden on staffand Veterans alike, particularly for patients without privacy sensitive conditions. Until recently, automatic sensitivity determination has not been feasible. This paper proposes a policy-driven algorithmic approach (Security Labeling Service or SLS) to health information exchange that automatically detects the presence or absence of specific privacy sensitive conditions and then, to only require a Veteran signed consent for release when actually present. The SLS was applied successfully to a sample of real patient Consolidated-Clinical Document Architecture(C-CDA) documents. The SLS identified standard terminology codes by both parsing structured entries and analyzing textual information using Natural Language Processing (NLP).
PMID: 28269828 [PubMed - in process]
Visualizing patient journals by combining vital signs monitoring and natural language processing.
Visualizing patient journals by combining vital signs monitoring and natural language processing.
Conf Proc IEEE Eng Med Biol Soc. 2016 Aug;2016:2529-2532
Authors: Vilic A, Petersen JA, Hoppe K, Sorensen HB
Abstract
This paper presents a data-driven approach to graphically presenting text-based patient journals while still maintaining all textual information. The system first creates a timeline representation of a patients' physiological condition during an admission, which is assessed by electronically monitoring vital signs and then combining these into Early Warning Scores (EWS). Hereafter, techniques from Natural Language Processing (NLP) are applied on the existing patient journal to extract all entries. Finally, the two methods are combined into an interactive timeline featuring the ability to see drastic changes in the patients' health, and thereby enabling staff to see where in the journal critical events have taken place.
PMID: 28268838 [PubMed - in process]
S2NI: a mobile platform for nutrition monitoring from spoken data.
S2NI: a mobile platform for nutrition monitoring from spoken data.
Conf Proc IEEE Eng Med Biol Soc. 2016 Aug;2016:1991-1994
Authors: Hezarjaribi N, Reynolds CA, Miller DT, Chaytor N, Ghasemzadeh H
Abstract
Diet and physical activity are important lifestyle and behavioral factors in self-management and prevention of many chronic diseases. Mobile sensors such as accelerometers have been used in the past to objectively measure physical activity or detect eating time. Diet monitoring, however, still relies on self-recorded data by end users where individuals use mobile devices for recording nutrition intake by either entering text or taking images. Such approaches have shown low adherence in technology adoption and achieve only moderate accuracy. In this paper, we propose development and validation of Speech-to-Nutrient-Information (S2NI), a comprehensive nutrition monitoring system that combines speech processing, natural language processing, and text mining in a unified platform to extract nutrient information such as calorie intake from spoken data. After converting the voice data to text, we identify food name and portion size information within the text. We then develop a tiered matching algorithm to search the food name in our nutrition database and to accurately compute calorie intake. Due to its pervasive nature and ease of use, S2NI enables users to report their diet routine more frequently and at anytime through their smartphone. We evaluate S2NI using real data collected with 10 participants. Our experimental results show that S2NI achieves 80.6% accuracy in computing calorie intake.
PMID: 28268720 [PubMed - in process]
Unsupervised Ensemble Ranking of Terms in Electronic Health Record Notes Based on Their Importance to Patients.
Unsupervised Ensemble Ranking of Terms in Electronic Health Record Notes Based on Their Importance to Patients.
J Biomed Inform. 2017 Mar 03;:
Authors: Chen J, Yu H
Abstract
BACKGROUND: Allowing patients to access their own electronic health record (EHR) notes through online patient portals has the potential to improve patient-centered care. However, EHR notes contain abundant medical jargon that can be difficult for patients to comprehend. One way to help patients is to reduce information overload and help them focus on medical terms that matter most to them. Targeted education can then be developed to improve patient EHR comprehension and the quality of care.
OBJECTIVE: The aim of this work was to develop FIT (Finding Important Terms for patients), an unsupervised natural language processing (NLP) system that ranks medical terms in EHR notes based on their importance to patients.
METHODS: We built FIT on a new unsupervised ensemble ranking model derived from the biased random walk algorithm to combine heterogeneous information resources for ranking candidate terms from each EHR note. Specifically, FIT integrates four single views (rankers) for term importance: patient use of medical concepts, document-level term salience, word-occurrence based term relatedness, and topic coherence. It also incorporates partial information of term importance as conveyed by terms' unfamiliarity levels and semantic types. We evaluated FIT on 90 expert-annotated EHR notes and used the four single-view rankers as baselines. In addition, we implemented three benchmark unsupervised ensemble ranking methods as strong baselines.
RESULTS: FIT achieved 0.885 AUC-ROC for ranking candidate terms from EHR notes to identify important terms. When including term identification, the performance of FIT for identifying important terms from EHR notes was 0.813 AUC-ROC. Both performance scores significantly exceeded the corresponding scores from the four single rankers (P<.001). FIT also outperformed the three ensemble rankers for most metrics. Its performance is relatively insensitive to its parameter.
CONCLUSIONS: FIT can automatically rank EHR terms important to patients. It may help develop future interventions to improve quality of care. By using unsupervised learning as well as a robust and flexible framework for information fusion, FIT can be readily applied to other domains and applications.
PMID: 28267590 [PubMed - as supplied by publisher]
Metabolomic network analysis of estrogen-stimulated MCF-7 cells: a comparison of overrepresentation analysis, quantitative enrichment analysis and pathway analysis versus metabolite network analysis.
Metabolomic network analysis of estrogen-stimulated MCF-7 cells: a comparison of overrepresentation analysis, quantitative enrichment analysis and pathway analysis versus metabolite network analysis.
Arch Toxicol. 2017 Jan;91(1):217-230
Authors: Maertens A, Bouhifd M, Zhao L, Odwin-DaCosta S, Kleensang A, Yager JD, Hartung T
Abstract
In the context of the Human Toxome project, mass spectroscopy-based metabolomics characterization of estrogen-stimulated MCF-7 cells was studied in order to support the untargeted deduction of pathways of toxicity. A targeted and untargeted approach using overrepresentation analysis (ORA), quantitative enrichment analysis (QEA) and pathway analysis (PA) and a metabolite network approach were compared. Any untargeted approach necessarily has some noise in the data owing to artifacts, outliers and misidentified metabolites. Depending on the chemical analytical choices (sample extraction, chromatography, instrument and settings, etc.), only a partial representation of all metabolites will be achieved, biased by both the analytical methods and the database used to identify the metabolites. Here, we show on the one hand that using a data analysis approach based exclusively on pathway annotations has the potential to miss much that is of interest and, in the case of misidentified metabolites, can produce perturbed pathways that are statistically significant yet uninformative for the biological sample at hand. On the other hand, a targeted approach, by narrowing its focus and minimizing (but not eliminating) misidentifications, renders the likelihood of a spurious pathway much smaller, but the limited number of metabolites also makes statistical significance harder to achieve. To avoid an analysis dependent on pathways, we built a de novo network using all metabolites that were different at 24 h with and without estrogen with a p value <0.01 (53) in the STITCH database, which links metabolites based on known reactions in the main metabolic network pathways but also based on experimental evidence and text mining. The resulting network contained a "connected component" of 43 metabolites and helped identify non-endogenous metabolites as well as pathways not visible by annotation-based approaches. Moreover, the most highly connected metabolites (energy metabolites such as pyruvate and alpha-ketoglutarate, as well as amino acids) showed only a modest change between proliferation with and without estrogen. Here, we demonstrate that estrogen has subtle but potentially phenotypically important alterations in the acyl-carnitine fatty acids, acetyl-putrescine and succinoadenosine, in addition to likely subtle changes in key energy metabolites that, however, could not be verified consistently given the technical limitations of this approach. Finally, we show that a network-based approach combined with text mining identifies pathways that would otherwise neither be considered statistically significant on their own nor be identified via ORA, QEA, or PA.
PMID: 27039105 [PubMed - indexed for MEDLINE]
BIOMedical Search Engine Framework: Lightweight and customized implementation of domain-specific biomedical search engines.
BIOMedical Search Engine Framework: Lightweight and customized implementation of domain-specific biomedical search engines.
Comput Methods Programs Biomed. 2016 Jul;131:63-77
Authors: Jácome AG, Fdez-Riverola F, Lourenço A
Abstract
BACKGROUND AND OBJECTIVES: Text mining and semantic analysis approaches can be applied to the construction of biomedical domain-specific search engines and provide an attractive alternative to create personalized and enhanced search experiences. Therefore, this work introduces the new open-source BIOMedical Search Engine Framework for the fast and lightweight development of domain-specific search engines. The rationale behind this framework is to incorporate core features typically available in search engine frameworks with flexible and extensible technologies to retrieve biomedical documents, annotate meaningful domain concepts, and develop highly customized Web search interfaces.
METHODS: The BIOMedical Search Engine Framework integrates taggers for major biomedical concepts, such as diseases, drugs, genes, proteins, compounds and organisms, and enables the use of domain-specific controlled vocabulary. Technologies from the Typesafe Reactive Platform, the AngularJS JavaScript framework and the Bootstrap HTML/CSS framework support the customization of the domain-oriented search application. Moreover, the RESTful API of the BIOMedical Search Engine Framework allows the integration of the search engine into existing systems or a complete web interface personalization.
RESULTS: The construction of the Smart Drug Search is described as proof-of-concept of the BIOMedical Search Engine Framework. This public search engine catalogs scientific literature about antimicrobial resistance, microbial virulence and topics alike. The keyword-based queries of the users are transformed into concepts and search results are presented and ranked accordingly. The semantic graph view portraits all the concepts found in the results, and the researcher may look into the relevance of different concepts, the strength of direct relations, and non-trivial, indirect relations. The number of occurrences of the concept shows its importance to the query, and the frequency of concept co-occurrence is indicative of biological relations meaningful to that particular scope of research. Conversely, indirect concept associations, i.e. concepts related by other intermediary concepts, can be useful to integrate information from different studies and look into non-trivial relations.
CONCLUSIONS: The BIOMedical Search Engine Framework supports the development of domain-specific search engines. The key strengths of the framework are modularity and extensibilityin terms of software design, the use of open-source consolidated Web technologies, and the ability to integrate any number of biomedical text mining tools and information resources. Currently, the Smart Drug Search keeps over 1,186,000 documents, containing more than 11,854,000 annotations for 77,200 different concepts. The Smart Drug Search is publicly accessible at http://sing.ei.uvigo.es/sds/. The BIOMedical Search Engine Framework is freely available for non-commercial use at https://github.com/agjacome/biomsef.
PMID: 27265049 [PubMed - indexed for MEDLINE]
Text mining for improved exposure assessment.
Text mining for improved exposure assessment.
PLoS One. 2017;12(3):e0173132
Authors: Larsson K, Baker S, Silins I, Guo Y, Stenius U, Korhonen A, Berglund M
Abstract
Chemical exposure assessments are based on information collected via different methods, such as biomonitoring, personal monitoring, environmental monitoring and questionnaires. The vast amount of chemical-specific exposure information available from web-based databases, such as PubMed, is undoubtedly a great asset to the scientific community. However, manual retrieval of relevant published information is an extremely time consuming task and overviewing the data is nearly impossible. Here, we present the development of an automatic classifier for chemical exposure information. First, nearly 3700 abstracts were manually annotated by an expert in exposure sciences according to a taxonomy exclusively created for exposure information. Natural Language Processing (NLP) techniques were used to extract semantic and syntactic features relevant to chemical exposure text. Using these features, we trained a supervised machine learning algorithm to automatically classify PubMed abstracts according to the exposure taxonomy. The resulting classifier demonstrates good performance in the intrinsic evaluation. We also show that the classifier improves information retrieval of chemical exposure data compared to keyword-based PubMed searches. Case studies demonstrate that the classifier can be used to assist researchers by facilitating information retrieval and classification, enabling data gap recognition and overviewing available scientific literature using chemical-specific publication profiles. Finally, we identify challenges to be addressed in future development of the system.
PMID: 28257498 [PubMed - in process]
Prediction of advertisement preference by fusing EEG response and sentiment analysis.
Prediction of advertisement preference by fusing EEG response and sentiment analysis.
Neural Netw. 2017 Feb 16;:
Authors: Gauba H, Kumar P, Roy PP, Singh P, Dogra DP, Raman B
Abstract
This paper presents a novel approach to predict rating of video-advertisements based on a multimodal framework combining physiological analysis of the user and global sentiment-rating available on the internet. We have fused Electroencephalogram (EEG) waves of user and corresponding global textual comments of the video to understand the user's preference more precisely. In our framework, the users were asked to watch the video-advertisement and simultaneously EEG signals were recorded. Valence scores were obtained using self-report for each video. A higher valence corresponds to intrinsic attractiveness of the user. Furthermore, the multimedia data that comprised of the comments posted by global viewers, were retrieved and processed using Natural Language Processing (NLP) technique for sentiment analysis. Textual contents from review comments were analyzed to obtain a score to understand sentiment nature of the video. A regression technique based on Random forest was used to predict the rating of an advertisement using EEG data. Finally, EEG based rating is combined with NLP-based sentiment score to improve the overall prediction. The study was carried out using 15 video clips of advertisements available online. Twenty five participants were involved in our study to analyze our proposed system. The results are encouraging and these suggest that the proposed multimodal approach can achieve lower RMSE in rating prediction as compared to the prediction using only EEG data.
PMID: 28254237 [PubMed - as supplied by publisher]
Natural Language Processing in Oncology: A Review.
Natural Language Processing in Oncology: A Review.
JAMA Oncol. 2016 Jun 01;2(6):797-804
Authors: Yim WW, Yetisgen M, Harris WP, Kwan SW
Abstract
IMPORTANCE: Natural language processing (NLP) has the potential to accelerate translation of cancer treatments from the laboratory to the clinic and will be a powerful tool in the era of personalized medicine. This technology can harvest important clinical variables trapped in the free-text narratives within electronic medical records.
OBSERVATIONS: Natural language processing can be used as a tool for oncological evidence-based research and quality improvement. Oncologists interested in applying NLP for clinical research can play pivotal roles in building NLP systems and, in doing so, contribute to both oncological and clinical NLP research. Herein, we provide an introduction to NLP and its potential applications in oncology, a description of specific tools available, and a review on the state of the current technology with respect to cancer case identification, staging, and outcomes quantification.
CONCLUSIONS AND RELEVANCE: More automated means of leveraging unstructured data from daily clinical practice is crucial as therapeutic options and access to individual-level health information increase. Research-minded oncologists may push the avenues of evidence-based research by taking advantage of the new technologies available with clinical NLP. As continued progress is made with applying NLP toward oncological research, incremental gains will lead to large impacts, building a cost-effective infrastructure for advancing cancer care.
PMID: 27124593 [PubMed - indexed for MEDLINE]
Automated discovery of safety and efficacy concerns for joint & muscle pain relief treatments from online reviews.
Automated discovery of safety and efficacy concerns for joint & muscle pain relief treatments from online reviews.
Int J Med Inform. 2017 Apr;100:108-120
Authors: Adams DZ, Gruss R, Abrahams AS
Abstract
OBJECTIVES: Product issues can cost companies millions in lawsuits and have devastating effects on a firm's sales, image and goodwill, especially in the era of social media. The ability for a system to detect the presence of safety and efficacy (S&E) concerns early on could not only protect consumers from injuries due to safety hazards, but could also mitigate financial damage to the manufacturer. Prior studies in the field of automated defect discovery have found industry-specific techniques appropriate to the automotive, consumer electronics, home appliance, and toy industries, but have not investigated pain relief medicines and medical devices. In this study, we focus specifically on automated discovery of S&E concerns in over-the-counter (OTC) joint and muscle pain relief remedies and devices.
METHODS: We select a dataset of over 32,000 records for three categories of Joint & Muscle Pain Relief treatments from Amazon's online product reviews, and train "smoke word" dictionaries which we use to score holdout reviews, for the presence of safety and efficacy issues. We also score using conventional sentiment analysis techniques.
RESULTS: Compared to traditional sentiment analysis techniques, we found that smoke term dictionaries were better suited to detect product concerns from online consumer reviews, and significantly outperformed the sentiment analysis techniques in uncovering both efficacy and safety concerns, across all product subcategories.
CONCLUSION: Our research can be applied to the healthcare and pharmaceutical industry in order to detect safety and efficacy concerns, reducing risks that consumers face using these products. These findings can be highly beneficial to improving quality assurance and management in joint and muscle pain relief.
PMID: 28241932 [PubMed - in process]
Text mining approach to predict hospital admissions using early medical records from the emergency department.
Text mining approach to predict hospital admissions using early medical records from the emergency department.
Int J Med Inform. 2017 Apr;100:1-8
Authors: Lucini FR, S Fogliatto F, C da Silveira GJ, L Neyeloff J, Anzanello MJ, de S Kuchenbecker R, D Schaan B
Abstract
OBJECTIVE: Emergency department (ED) overcrowding is a serious issue for hospitals. Early information on short-term inward bed demand from patients receiving care at the ED may reduce the overcrowding problem, and optimize the use of hospital resources. In this study, we use text mining methods to process data from early ED patient records using the SOAP framework, and predict future hospitalizations and discharges.
DESIGN: We try different approaches for pre-processing of text records and to predict hospitalization. Sets-of-words are obtained via binary representation, term frequency, and term frequency-inverse document frequency. Unigrams, bigrams and trigrams are tested for feature formation. Feature selection is based on χ(2) and F-score metrics. In the prediction module, eight text mining methods are tested: Decision Tree, Random Forest, Extremely Randomized Tree, AdaBoost, Logistic Regression, Multinomial Naïve Bayes, Support Vector Machine (Kernel linear) and Nu-Support Vector Machine (Kernel linear).
MEASUREMENTS: Prediction performance is evaluated by F1-scores. Precision and Recall values are also informed for all text mining methods tested.
RESULTS: Nu-Support Vector Machine was the text mining method with the best overall performance. Its average F1-score in predicting hospitalization was 77.70%, with a standard deviation (SD) of 0.66%.
CONCLUSIONS: The method could be used to manage daily routines in EDs such as capacity planning and resource allocation. Text mining could provide valuable information and facilitate decision-making by inward bed management teams.
PMID: 28241931 [PubMed - in process]
Early recognition of multiple sclerosis using natural language processing of the electronic health record.
Early recognition of multiple sclerosis using natural language processing of the electronic health record.
BMC Med Inform Decis Mak. 2017 Feb 28;17(1):24
Authors: Chase HS, Mitrani LR, Lu GG, Fulgieri DJ
Abstract
BACKGROUND: Diagnostic accuracy might be improved by algorithms that searched patients' clinical notes in the electronic health record (EHR) for signs and symptoms of diseases such as multiple sclerosis (MS). The focus this study was to determine if patients with MS could be identified from their clinical notes prior to the initial recognition by their healthcare providers.
METHODS: An MS-enriched cohort of patients with well-established MS (n = 165) and controls (n = 545), was generated from the adult outpatient clinic. A random sample cohort was generated from randomly selected patients (n = 2289) from the same adult outpatient clinic, some of whom had MS (n = 16). Patients' notes were extracted from the data warehouse and signs and symptoms mapped to UMLS terms using MedLEE. Approximately 1000 MS-related terms occurred significantly more frequently in MS patients' notes than controls'. Synonymous terms were manually clustered into 50 buckets and used as classification features. Patients were classified as MS or not using Naïve Bayes classification.
RESULTS: Classification of patients known to have MS using notes of the MS-enriched cohort entered after the initial ICD9[MS] code yielded an ROC AUC, sensitivity, and specificity of 0.90 [0.87-0.93], 0.75[0.66-0.82], and 0.91 [0.87-0.93], respectively. Similar classification accuracy was achieved using the notes from the random sample cohort. Classification of patients not yet known to have MS using notes of the MS-enriched cohort entered before the initial ICD9[MS] documentation identified 40% [23-59%] as having MS. Manual review of the EHR of 45 patients of the random sample cohort classified as having MS but lacking an ICD9[MS] code identified four who might have unrecognized MS.
CONCLUSIONS: Diagnostic accuracy might be improved by mining patients' clinical notes for signs and symptoms of specific diseases using NLP. Using this approach, we identified patients with MS early in the course of their disease which could potentially shorten the time to diagnosis. This approach could also be applied to other diseases often missed by primary care providers such as cancer. Whether implementing computerized diagnostic support ultimately shortens the time from earliest symptoms to formal recognition of the disease remains to be seen.
PMID: 28241760 [PubMed - in process]
Nematode neuropeptides as transgenic nematicides.
Nematode neuropeptides as transgenic nematicides.
PLoS Pathog. 2017 Feb 27;13(2):e1006237
Authors: Warnock ND, Wilson L, Patten C, Fleming CC, Maule AG, Dalzell JJ
Abstract
Plant parasitic nematodes (PPNs) seriously threaten global food security. Conventionally an integrated approach to PPN management has relied heavily on carbamate, organophosphate and fumigant nematicides which are now being withdrawn over environmental health and safety concerns. This progressive withdrawal has left a significant shortcoming in our ability to manage these economically important parasites, and highlights the need for novel and robust control methods. Nematodes can assimilate exogenous peptides through retrograde transport along the chemosensory amphid neurons. Peptides can accumulate within cells of the central nerve ring and can elicit physiological effects when released to interact with receptors on adjoining cells. We have profiled bioactive neuropeptides from the neuropeptide-like protein (NLP) family of PPNs as novel nematicides, and have identified numerous discrete NLPs that negatively impact chemosensation, host invasion and stylet thrusting of the root knot nematode Meloidogyne incognita and the potato cyst nematode Globodera pallida. Transgenic secretion of these peptides from the rhizobacterium, Bacillus subtilis, and the terrestrial microalgae Chlamydomonas reinhardtii reduce tomato infection levels by up to 90% when compared with controls. These data pave the way for the exploitation of nematode neuropeptides as a novel class of plant protective nematicide, using novel non-food transgenic delivery systems which could be deployed on farmer-preferred cultivars.
PMID: 28241060 [PubMed - as supplied by publisher]
Text mining a self-report back-translation.
Text mining a self-report back-translation.
Psychol Assess. 2016 06;28(6):750-64
Authors: Blanch A, Aluja A
Abstract
There are several recommendations about the routine to undertake when back translating self-report instruments in cross-cultural research. However, text mining methods have been generally ignored within this field. This work describes a text mining innovative application useful to adapt a personality questionnaire to 12 different languages. The method is divided in 3 different stages, a descriptive analysis of the available back-translated instrument versions, a dissimilarity assessment between the source language instrument and the 12 back-translations, and an item assessment of item meaning equivalence. The suggested method contributes to improve the back-translation process of self-report instruments for cross-cultural research in 2 significant intertwined ways. First, it defines a systematic approach to the back translation issue, allowing for a more orderly and informed evaluation concerning the equivalence of different versions of the same instrument in different languages. Second, it provides more accurate instrument back-translations, which has direct implications for the reliability and validity of the instrument's test scores when used in different cultures/languages. In addition, this procedure can be extended to the back-translation of self-reports measuring psychological constructs in clinical assessment. Future research works could refine the suggested methodology and use additional available text mining tools. (PsycINFO Database Record
PMID: 26302100 [PubMed - indexed for MEDLINE]