Semantic Web

The KIT Motion-Language Dataset.

Tue, 2016-12-20 08:02

The KIT Motion-Language Dataset.

Big Data. 2016 Dec;4(4):236-252

Authors: Plappert M, Mandery C, Asfour T

Abstract
Linking human motion and natural language is of great interest for the generation of semantic representations of human activities as well as for the generation of robot activities based on natural language input. However, although there have been years of research in this area, no standardized and openly available data set exists to support the development and evaluation of such systems. We, therefore, propose the Karlsruhe Institute of Technology (KIT) Motion-Language Dataset, which is large, open, and extensible. We aggregate data from multiple motion capture databases and include them in our data set using a unified representation that is independent of the capture system or marker set, making it easy to work with the data regardless of its origin. To obtain motion annotations in natural language, we apply a crowd-sourcing approach and a web-based tool that was specifically build for this purpose, the Motion Annotation Tool. We thoroughly document the annotation process itself and discuss gamification methods that we used to keep annotators motivated. We further propose a novel method, perplexity-based selection, which systematically selects motions for further annotation that are either under-represented in our data set or that have erroneous annotations. We show that our method mitigates the two aforementioned problems and ensures a systematic annotation process. We provide an in-depth analysis of the structure and contents of our resulting data set, which, as of October 10, 2016, contains 3911 motions with a total duration of 11.23 hours and 6278 annotations in natural language that contain 52,903 words. We believe this makes our data set an excellent choice that enables more transparent and comparable research in this important area.

PMID: 27992262 [PubMed]

Categories: Literature Watch

(semantic[Title/Abstract] AND web[Title/Abstract]) AND ("2005/01/01"[PDAT] : "3000"[PDAT]); +24 new citations

Thu, 2016-12-15 18:58

24 new pubmed citations were retrieved for your search. Click on the search hyperlink below to display the complete search results:

(semantic[Title/Abstract] AND web[Title/Abstract]) AND ("2005/01/01"[PDAT] : "3000"[PDAT])

These pubmed results were generated on 2016/12/15

PubMed comprises more than 24 million citations for biomedical literature from MEDLINE, life science journals, and online books. Citations may include links to full-text content from PubMed Central and publisher web sites.

Categories: Literature Watch

Isomorphic semantic mapping of variant call format (VCF2RDF).

Tue, 2016-11-01 07:43

Isomorphic semantic mapping of variant call format (VCF2RDF).

Bioinformatics. 2016 Oct 25;:

Authors: Penha ED, Iriabho E, Dussaq A, Magalhães de Oliveira D, Almeida JS

Abstract
The move of computational genomics workflows to Cloud Computing platforms is associated with a new level of integration and interoperability that challenges existing data representation formats. The Variant Calling Format (VCF) is in a particularly sensitive position in that regard, with both clinical and consumer-facing analysis tools relying on this self-contained description of genomic variation in Next Generation Sequencing (NGS) results. In this report we identify an isomorphic map between VCF and the reference Resource Description Framework. RDF is advanced by the World Wide Web Consortium (W3C) to enable representations of linked data that are both distributed and discoverable. The resulting ability to decompose VCF reports of genomic variation without loss of context addresses the need to modularize and govern NGS pipelines for Precision Medicine. Specifically, it provides the flexibility (i.e. the indexing) needed to support the wide variety of clinical scenarios and patient-facing governance where only part of the VCF data is fitting.
IMPLEMENTATION: Software libraries with a claim to be both domain-facing and consumer-facing have to pass the test of portability across the variety of devices that those consumers in fact adopt. That is, ideally the implementation should itself take place within the space defined by web technologies. Consequently, the isomorphic mapping function was implemented in JavaScript, and was tested in a variety of environments and devices, client and server side alike. These range from web browsers in mobile phones to the most popular micro service platform, NodeJS.
AVAILABILITY: The code is publicly available at https://github.com/ibl/VCFr , with a live deployment at: http://ibl.github.io/VCFr/ CONTACT: Jonas.almeida@stonybrook.edu.

PMID: 27797761 [PubMed - as supplied by publisher]

Categories: Literature Watch

Lessons learned in the generation of biomedical research datasets using Semantic Open Data technologies.

Tue, 2016-11-01 07:43
Related Articles

Lessons learned in the generation of biomedical research datasets using Semantic Open Data technologies.

Stud Health Technol Inform. 2015;210:165-9

Authors: Legaz-García Mdel C, Miñarro-Giménez JA, Menárguez-Tortosa M, Fernández-Breis JT

Abstract
Biomedical research usually requires combining large volumes of data from multiple heterogeneous sources. Such heterogeneity makes difficult not only the generation of research-oriented dataset but also its exploitation. In recent years, the Open Data paradigm has proposed new ways for making data available in ways that sharing and integration are facilitated. Open Data approaches may pursue the generation of content readable only by humans and by both humans and machines, which are the ones of interest in our work. The Semantic Web provides a natural technological space for data integration and exploitation and offers a range of technologies for generating not only Open Datasets but also Linked Datasets, that is, open datasets linked to other open datasets. According to the Berners-Lee's classification, each open dataset can be given a rating between one and five stars attending to can be given to each dataset. In the last years, we have developed and applied our SWIT tool, which automates the generation of semantic datasets from heterogeneous data sources. SWIT produces four stars datasets, given that fifth one can be obtained by being the dataset linked from external ones. In this paper, we describe how we have applied the tool in two projects related to health care records and orthology data, as well as the major lessons learned from such efforts.

PMID: 25991123 [PubMed - indexed for MEDLINE]

Categories: Literature Watch

Developing a modular architecture for creation of rule-based clinical diagnostic criteria.

Fri, 2016-10-28 06:51
Related Articles

Developing a modular architecture for creation of rule-based clinical diagnostic criteria.

BioData Min. 2016;9:33

Authors: Hong N, Pathak J, Chute CG, Jiang G

Abstract
BACKGROUND: With recent advances in computerized patient records system, there is an urgent need for producing computable and standards-based clinical diagnostic criteria. Notably, constructing rule-based clinical diagnosis criteria has become one of the goals in the International Classification of Diseases (ICD)-11 revision. However, few studies have been done in building a unified architecture to support the need for diagnostic criteria computerization. In this study, we present a modular architecture for enabling the creation of rule-based clinical diagnostic criteria leveraging Semantic Web technologies.
METHODS AND RESULTS: The architecture consists of two modules: an authoring module that utilizes a standards-based information model and a translation module that leverages Semantic Web Rule Language (SWRL). In a prototype implementation, we created a diagnostic criteria upper ontology (DCUO) that integrates ICD-11 content model with the Quality Data Model (QDM). Using the DCUO, we developed a transformation tool that converts QDM-based diagnostic criteria into Semantic Web Rule Language (SWRL) representation. We evaluated the domain coverage of the upper ontology model using randomly selected diagnostic criteria from broad domains (n = 20). We also tested the transformation algorithms using 6 QDM templates for ontology population and 15 QDM-based criteria data for rule generation. As the results, the first draft of DCUO contains 14 root classes, 21 subclasses, 6 object properties and 1 data property. Investigation Findings, and Signs and Symptoms are the two most commonly used element types. All 6 HQMF templates are successfully parsed and populated into their corresponding domain specific ontologies and 14 rules (93.3 %) passed the rule validation.
CONCLUSION: Our efforts in developing and prototyping a modular architecture provide useful insight into how to build a scalable solution to support diagnostic criteria representation and computerization.

PMID: 27785153 [PubMed - in process]

Categories: Literature Watch

Using Semantic Web technologies for the generation of domain-specific templates to support clinical study metadata standards.

Thu, 2016-10-27 06:37
Related Articles

Using Semantic Web technologies for the generation of domain-specific templates to support clinical study metadata standards.

J Biomed Semantics. 2016;7:10

Authors: Jiang G, Evans J, Endle CM, Solbrig HR, Chute CG

Abstract
BACKGROUND: The Biomedical Research Integrated Domain Group (BRIDG) model is a formal domain analysis model for protocol-driven biomedical research, and serves as a semantic foundation for application and message development in the standards developing organizations (SDOs). The increasing sophistication and complexity of the BRIDG model requires new approaches to the management and utilization of the underlying semantics to harmonize domain-specific standards. The objective of this study is to develop and evaluate a Semantic Web-based approach that integrates the BRIDG model with ISO 21090 data types to generate domain-specific templates to support clinical study metadata standards development.
METHODS: We developed a template generation and visualization system based on an open source Resource Description Framework (RDF) store backend, a SmartGWT-based web user interface, and a "mind map" based tool for the visualization of generated domain-specific templates. We also developed a RESTful Web Service informed by the Clinical Information Modeling Initiative (CIMI) reference model for access to the generated domain-specific templates.
RESULTS: A preliminary usability study is performed and all reviewers (n = 3) had very positive responses for the evaluation questions in terms of the usability and the capability of meeting the system requirements (with the average score of 4.6).
CONCLUSIONS: Semantic Web technologies provide a scalable infrastructure and have great potential to enable computable semantic interoperability of models in the intersection of health care and clinical research.

PMID: 26949508 [PubMed - indexed for MEDLINE]

Categories: Literature Watch

Portal of medical data models: information infrastructure for medical research and healthcare.

Thu, 2016-10-27 06:37
Related Articles

Portal of medical data models: information infrastructure for medical research and healthcare.

Database (Oxford). 2016;2016:

Authors: Dugas M, Neuhaus P, Meidt A, Doods J, Storck M, Bruland P, Varghese J

Abstract
INTRODUCTION: Information systems are a key success factor for medical research and healthcare. Currently, most of these systems apply heterogeneous and proprietary data models, which impede data exchange and integrated data analysis for scientific purposes. Due to the complexity of medical terminology, the overall number of medical data models is very high. At present, the vast majority of these models are not available to the scientific community. The objective of the Portal of Medical Data Models (MDM, https://medical-data-models.org) is to foster sharing of medical data models.
METHODS: MDM is a registered European information infrastructure. It provides a multilingual platform for exchange and discussion of data models in medicine, both for medical research and healthcare. The system is developed in collaboration with the University Library of Münster to ensure sustainability. A web front-end enables users to search, view, download and discuss data models. Eleven different export formats are available (ODM, PDF, CDA, CSV, MACRO-XML, REDCap, SQL, SPSS, ADL, R, XLSX). MDM contents were analysed with descriptive statistics.
RESULTS: MDM contains 4387 current versions of data models (in total 10,963 versions). 2475 of these models belong to oncology trials. The most common keyword (n = 3826) is 'Clinical Trial'; most frequent diseases are breast cancer, leukemia, lung and colorectal neoplasms. Most common languages of data elements are English (n = 328,557) and German (n = 68,738). Semantic annotations (UMLS codes) are available for 108,412 data items, 2453 item groups and 35,361 code list items. Overall 335,087 UMLS codes are assigned with 21,847 unique codes. Few UMLS codes are used several thousand times, but there is a long tail of rarely used codes in the frequency distribution.
DISCUSSION: Expected benefits of the MDM portal are improved and accelerated design of medical data models by sharing best practice, more standardised data models with semantic annotation and better information exchange between information systems, in particular Electronic Data Capture (EDC) and Electronic Health Records (EHR) systems. Contents of the MDM portal need to be further expanded to reach broad coverage of all relevant medical domains. Database URL: https://medical-data-models.org.

PMID: 26868052 [PubMed - indexed for MEDLINE]

Categories: Literature Watch

A new visual navigation system for exploring biomedical Open Educational Resource (OER) videos.

Wed, 2016-10-26 06:17
Related Articles

A new visual navigation system for exploring biomedical Open Educational Resource (OER) videos.

J Am Med Inform Assoc. 2016 Apr;23(e1):e34-41

Authors: Zhao B, Xu S, Lin S, Luo X, Duan L

Abstract
OBJECTIVE: Biomedical videos as open educational resources (OERs) are increasingly proliferating on the Internet. Unfortunately, seeking personally valuable content from among the vast corpus of quality yet diverse OER videos is nontrivial due to limitations of today's keyword- and content-based video retrieval techniques. To address this need, this study introduces a novel visual navigation system that facilitates users' information seeking from biomedical OER videos in mass quantity by interactively offering visual and textual navigational clues that are both semantically revealing and user-friendly.
MATERIALS AND METHODS: The authors collected and processed around 25 000 YouTube videos, which collectively last for a total length of about 4000 h, in the broad field of biomedical sciences for our experiment. For each video, its semantic clues are first extracted automatically through computationally analyzing audio and visual signals, as well as text either accompanying or embedded in the video. These extracted clues are subsequently stored in a metadata database and indexed by a high-performance text search engine. During the online retrieval stage, the system renders video search results as dynamic web pages using a JavaScript library that allows users to interactively and intuitively explore video content both efficiently and effectively.ResultsThe authors produced a prototype implementation of the proposed system, which is publicly accessible athttps://patentq.njit.edu/oer To examine the overall advantage of the proposed system for exploring biomedical OER videos, the authors further conducted a user study of a modest scale. The study results encouragingly demonstrate the functional effectiveness and user-friendliness of the new system for facilitating information seeking from and content exploration among massive biomedical OER videos.
CONCLUSION: Using the proposed tool, users can efficiently and effectively find videos of interest, precisely locate video segments delivering personally valuable information, as well as intuitively and conveniently preview essential content of a single or a collection of videos.

PMID: 26335986 [PubMed - indexed for MEDLINE]

Categories: Literature Watch

Centralizing content and distributing labor: a community model for curating the very long tail of microbial genomes.

Tue, 2016-10-25 06:04
Related Articles

Centralizing content and distributing labor: a community model for curating the very long tail of microbial genomes.

Database (Oxford). 2016;2016:

Authors: Putman TE, Burgstaller-Muehlbacher S, Waagmeester A, Wu C, Su AI, Good BM

Abstract
The last 20 years of advancement in sequencing technologies have led to sequencing thousands of microbial genomes, creating mountains of genetic data. While efficiency in generating the data improves almost daily, applying meaningful relationships between taxonomic and genetic entities on this scale requires a structured and integrative approach. Currently, knowledge is distributed across a fragmented landscape of resources from government-funded institutions such as National Center for Biotechnology Information (NCBI) and UniProt to topic-focused databases like the ODB3 database of prokaryotic operons, to the supplemental table of a primary publication. A major drawback to large scale, expert-curated databases is the expense of maintaining and extending them over time. No entity apart from a major institution with stable long-term funding can consider this, and their scope is limited considering the magnitude of microbial data being generated daily. Wikidata is an openly editable, semantic web compatible framework for knowledge representation. It is a project of the Wikimedia Foundation and offers knowledge integration capabilities ideally suited to the challenge of representing the exploding body of information about microbial genomics. We are developing a microbial specific data model, based on Wikidata's semantic web compatibility, which represents bacterial species, strains and the gene and gene products that define them. Currently, we have loaded 43,694 gene and 37,966 protein items for 21 species of bacteria, including the human pathogenic bacteriaChlamydia trachomatis.Using this pathogen as an example, we explore complex interactions between the pathogen, its host, associated genes, other microbes, disease and drugs using the Wikidata SPARQL endpoint. In our next phase of development, we will add another 99 bacterial genomes and their gene and gene products, totaling ∼900,000 additional entities. This aggregation of knowledge will be a platform for community-driven collaboration, allowing the networking of microbial genetic data through the sharing of knowledge by both the data and domain expert.

PMID: 27022157 [PubMed - indexed for MEDLINE]

Categories: Literature Watch

Wikidata as a semantic framework for the Gene Wiki initiative.

Tue, 2016-10-25 06:04
Related Articles

Wikidata as a semantic framework for the Gene Wiki initiative.

Database (Oxford). 2016;2016:

Authors: Burgstaller-Muehlbacher S, Waagmeester A, Mitraka E, Turner J, Putman T, Leong J, Naik C, Pavlidis P, Schriml L, Good BM, Su AI

Abstract
Open biological data are distributed over many resources making them challenging to integrate, to update and to disseminate quickly. Wikidata is a growing, open community database which can serve this purpose and also provides tight integration with Wikipedia. In order to improve the state of biological data, facilitate data management and dissemination, we imported all human and mouse genes, and all human and mouse proteins into Wikidata. In total, 59,721 human genes and 73,355 mouse genes have been imported from NCBI and 27,306 human proteins and 16,728 mouse proteins have been imported from the Swissprot subset of UniProt. As Wikidata is open and can be edited by anybody, our corpus of imported data serves as the starting point for integration of further data by scientists, the Wikidata community and citizen scientists alike. The first use case for these data is to populate Wikipedia Gene Wiki infoboxes directly from Wikidata with the data integrated above. This enables immediate updates of the Gene Wiki infoboxes as soon as the data in Wikidata are modified. Although Gene Wiki pages are currently only on the English language version of Wikipedia, the multilingual nature of Wikidata allows for usage of the data we imported in all 280 different language Wikipedias. Apart from the Gene Wiki infobox use case, a SPARQL endpoint and exporting functionality to several standard formats (e.g. JSON, XML) enable use of the data by scientists. In summary, we created a fully open and extensible data resource for human and mouse molecular biology and biochemistry data. This resource enriches all the Wikipedias with structured information and serves as a new linking hub for the biological semantic web. Database URL: https://www.wikidata.org/.

PMID: 26989148 [PubMed - indexed for MEDLINE]

Categories: Literature Watch

Prioritization, clustering and functional annotation of MicroRNAs using latent semantic indexing of MEDLINE abstracts.

Sat, 2016-10-22 08:22

Prioritization, clustering and functional annotation of MicroRNAs using latent semantic indexing of MEDLINE abstracts.

BMC Bioinformatics. 2016 Oct 6;17(Suppl 13):350

Authors: Roy S, Curry BC, Madahian B, Homayouni R

Abstract
BACKGROUND: The amount of scientific information about MicroRNAs (miRNAs) is growing exponentially, making it difficult for researchers to interpret experimental results. In this study, we present an automated text mining approach using Latent Semantic Indexing (LSI) for prioritization, clustering and functional annotation of miRNAs.
RESULTS: For approximately 900 human miRNAs indexed in miRBase, text documents were created by concatenating titles and abstracts of MEDLINE citations which refer to the miRNAs. The documents were parsed and a weighted term-by-miRNA frequency matrix was created, which was subsequently factorized via singular value decomposition to extract pair-wise cosine values between the term (keyword) and miRNA vectors in reduced rank semantic space. LSI enables derivation of both explicit and implicit associations between entities based on word usage patterns. Using miR2Disease as a gold standard, we found that LSI identified keyword-to-miRNA relationships with high accuracy. In addition, we demonstrate that pair-wise associations between miRNAs can be used to group them into categories which are functionally aligned. Finally, term ranking by querying the LSI space with a group of miRNAs enabled annotation of the clusters with functionally related terms.
CONCLUSIONS: LSI modeling of MEDLINE abstracts provides a robust and automated method for miRNA related knowledge discovery. The latest collection of miRNA abstracts and LSI model can be accessed through the web tool miRNA Literature Network (miRLiN) at http://bioinfo.memphis.edu/mirlin .

PMID: 27766940 [PubMed - in process]

Categories: Literature Watch

Data Integration and Mining for Synthetic Biology Design.

Sat, 2016-10-22 08:22
Related Articles

Data Integration and Mining for Synthetic Biology Design.

ACS Synth Biol. 2016 Oct 21;5(10):1086-1097

Authors: Mısırlı G, Hallinan J, Pocock M, Lord P, McLaughlin JA, Sauro H, Wipat A

Abstract
One aim of synthetic biologists is to create novel and predictable biological systems from simpler modular parts. This approach is currently hampered by a lack of well-defined and characterized parts and devices. However, there is a wealth of existing biological information, which can be used to identify and characterize biological parts, and their design constraints in the literature and numerous biological databases. However, this information is spread among these databases in many different formats. New computational approaches are required to make this information available in an integrated format that is more amenable to data mining. A tried and tested approach to this problem is to map disparate data sources into a single data set, with common syntax and semantics, to produce a data warehouse or knowledge base. Ontologies have been used extensively in the life sciences, providing this common syntax and semantics as a model for a given biological domain, in a fashion that is amenable to computational analysis and reasoning. Here, we present an ontology for applications in synthetic biology design, SyBiOnt, which facilitates the modeling of information about biological parts and their relationships. SyBiOnt was used to create the SyBiOntKB knowledge base, incorporating and building upon existing life sciences ontologies and standards. The reasoning capabilities of ontologies were then applied to automate the mining of biological parts from this knowledge base. We propose that this approach will be useful to speed up synthetic biology design and ultimately help facilitate the automation of the biological engineering life cycle.

PMID: 27110921 [PubMed - in process]

Categories: Literature Watch

PopHR: a knowledge-based platform to support integration, analysis, and visualization of population health data.

Tue, 2016-10-18 07:17

PopHR: a knowledge-based platform to support integration, analysis, and visualization of population health data.

Ann N Y Acad Sci. 2016 Oct 17;:

Authors: Shaban-Nejad A, Lavigne M, Okhmatovskaia A, Buckeridge DL

Abstract
Population health decision makers must consider complex relationships between multiple concepts measured with differential accuracy from heterogeneous data sources. Population health information systems are currently limited in their ability to integrate data and present a coherent portrait of population health. Consequentially, these systems can provide only basic support for decision makers. The Population Health Record (PopHR) is a semantic web application that automates the integration and extraction of massive amounts of heterogeneous data from multiple distributed sources (e.g., administrative data, clinical records, and survey responses) to support the measurement and monitoring of population health and health system performance for a defined population. The design of the PopHR draws on the theories of the determinants of health and evidence-based public health to harmonize and explicitly link information about a population with evidence about the epidemiology and control of chronic diseases. Organizing information in this manner and linking it explicitly to evidence is expected to improve decision making related to the planning, implementation, and evaluation of population health and health system interventions. In this paper, we describe the PopHR platform and discuss the architecture, design, key modules, and its implementation and use.

PMID: 27750378 [PubMed - as supplied by publisher]

Categories: Literature Watch

Semantic Indexing of Medical Learning Objects: Medical Students' Usage of a Semantic Network.

Thu, 2016-10-13 09:00
Related Articles

Semantic Indexing of Medical Learning Objects: Medical Students' Usage of a Semantic Network.

JMIR Med Educ. 2015 Nov 11;1(2):e16

Authors: Tix N, Gießler P, Ohnesorge-Radtke U, Spreckelsen C

Abstract
BACKGROUND: The Semantically Annotated Media (SAM) project aims to provide a flexible platform for searching, browsing, and indexing medical learning objects (MLOs) based on a semantic network derived from established classification systems. Primarily, SAM supports the Aachen emedia skills lab, but SAM is ready for indexing distributed content and the Simple Knowledge Organizing System standard provides a means for easily upgrading or even exchanging SAM's semantic network. There is a lack of research addressing the usability of MLO indexes or search portals like SAM and the user behavior with such platforms.
OBJECTIVE: The purpose of this study was to assess the usability of SAM by investigating characteristic user behavior of medical students accessing MLOs via SAM.
METHODS: In this study, we chose a mixed-methods approach. Lean usability testing was combined with usability inspection by having the participants complete four typical usage scenarios before filling out a questionnaire. The questionnaire was based on the IsoMetrics usability inventory. Direct user interaction with SAM (mouse clicks and pages accessed) was logged.
RESULTS: The study analyzed the typical usage patterns and habits of students using a semantic network for accessing MLOs. Four scenarios capturing characteristics of typical tasks to be solved by using SAM yielded high ratings of usability items and showed good results concerning the consistency of indexing by different users. Long-tail phenomena emerge as they are typical for a collaborative Web 2.0 platform. Suitable but nonetheless rarely used keywords were assigned to MLOs by some users.
CONCLUSIONS: It is possible to develop a Web-based tool with high usability and acceptance for indexing and retrieval of MLOs. SAM can be applied to indexing multicentered repositories of MLOs collaboratively.

PMID: 27731860 [PubMed - in process]

Categories: Literature Watch

Inferring unknown biological functions by integration of GO annotations and gene expression data.

Tue, 2016-10-11 08:27
Related Articles

Inferring unknown biological functions by integration of GO annotations and gene expression data.

IEEE/ACM Trans Comput Biol Bioinform. 2016 Oct 07;:

Authors: Leale G, Baya A, Milone D, Granitto P, Stegmayer G

Abstract
Characterizing genes with semantic information is an important process regarding the description of gene products. In spite that complete genomes of many organisms have been already sequenced, the biological functions of all of their genes are still unknown. Since experimentally studying the functions of those genes, one by one, would be unfeasible, new computational methods for gene functions inference are needed. We present here a novel computational approach for inferring biological function for a set of genes with previously unknown function, given a set of genes with well-known information. This approach is based on the premise that genes with similar behaviour should be grouped together. This is known as the guilt-by-association principle. Thus, it is possible to take advantage of clustering techniques to obtain groups of unknown genes that are co-clustered with genes that have well-known semantic information (GO annotations). Meaningful knowledge to infer unknown semantic information can therefore be provided by these well-known genes. We provide a method to explore the potential function of new genes according to those currently annotated. The results obtained indicate that the proposed approach could be a useful and effective tool when used by biologists to guide the inference of biological functions for recently discovered genes. Our work sets an important landmark in the field of identifying unknown gene functions through clustering, using an external source of biological input. A simple web interface to this proposal can be found at http://fich.unl.edu.ar/sinc/webdemo/gamma-am/.

PMID: 27723603 [PubMed - as supplied by publisher]

Categories: Literature Watch

Cross-Species Analysis of Gene Expression and Function in Prefrontal Cortex, Hippocampus and Striatum.

Sat, 2016-10-08 10:50
Related Articles

Cross-Species Analysis of Gene Expression and Function in Prefrontal Cortex, Hippocampus and Striatum.

PLoS One. 2016;11(10):e0164295

Authors: Chen W, Xia X, Song N, Wang Y, Zhu H, Deng W, Kong Q, Pan X, Qin C

Abstract
BACKGROUND: Mouse has been extensively used as a tool for investigating the onset and development of human neurological disorders. As a first step to construct a transgenic mouse model of human brain lesions, it is of fundamental importance to clarify the similarity and divergence of genetic background between non-diseased human and mouse brain tissues.
METHODS: We systematically compared, based on large scale integrated microarray data, the transcriptomes of three anatomically distinct brain regions; prefrontal cortex (PFC), hippocampus (HIP) and striatum (STR), across human and mouse. The widely used DAVID web server was used to decipher the biological functions of the highly expressed genes that were identified using a previously reported approach. Venn analysis was used to depict the overlapping ratios of the notably enriched biological process (BP) terms (one-tailed Fisher's exact test and Benjamini correction; adjusted p < 0.01) between two brain tissues. GOSemSim, an R package, was selected to perform GO semantic similarity analysis. Next, we adjusted signal intensities of orthologous genes by the total signals in all samples within species, and used one minus Pearson's correlation coefficient to assess the expression distance. Hierarchical clustering and principal component analysis (PCA) were selected for expression pattern analysis. Lineage specific expressed orthologous genes were identified by comparison of the most extreme sub-datasets across species and further verified using reverse transcription PCR (RT-PCR) and quantitative real-time PCR (qRT-PCR).
RESULTS: We found that the number of the significantly enriched BP terms of the highly expressed genes in human brain regions is larger than that in mouse corresponding brain regions. The mainly involved BP terms in human brain tissues associated with protein-membrane targeting and selenium metabolism are species-specific. The overlapping ratios of all the significantly enriched BP terms between any two brain tissues across species are lower than that within species, but the pairwise semantic similarities are very high between any two brain tissues from either human or mouse. Hierarchical clustering analysis shows the biological functions of the highly expressed genes in brain tissues are more consistent within species than interspecies; whereas it shows the expression patterns of orthologous genes are evidently conserved between human and mouse equivalent brain tissues. In addition, we identified four orthologous genes (COX5B, WIF1, SLC4A10 and PLA2G7) that are species-specific, which have been widely studied and confirmed to be closely linked with neuro- physiological and pathological functions.
CONCLUSION: Our study highlights the similarities and divergences in gene function and expression between human and mouse corresponding brain regions, including PFC, HIP and STR.

PMID: 27716781 [PubMed - in process]

Categories: Literature Watch

Web Video Event Recognition by Semantic Analysis from Ubiquitous Documents.

Thu, 2016-10-06 06:50

Web Video Event Recognition by Semantic Analysis from Ubiquitous Documents.

IEEE Trans Image Process. 2016 Sep 27;:

Authors: Yu L, Yang Y, Huang Z, Wang P, Song J, Shen H

Abstract
In recent years, the task of event recognition from videos has attracted increasing interest in multimedia area. While most of the existing research was mainly focused on exploring visual cues to handle relatively small-granular events, it is difficult to directly analyse video content without any prior knowledge. Therefore, synthesizing both the visual and semantic analysis is a natural way for video event understanding. In this paper, we study the problem of web video event recognition, where web videos often describe largegranular events and carry limited textual information. Key challenges include how to accurately represent event semantics from incomplete textual information and how to effectively explore the correlation between visual and textual cues for video event understanding. We propose a novel framework to perform complex event recognition from web videos. In order to compensate the insufficient expressive power of visual cues, we construct an event knowledge base by deeply mining semantic information from ubiquitous web documents. This event knowledge base is capable of describing each event with comprehensive semantics. By utilizing this base, the textual cues for a video can be significantly enriched. Furthermore, we introduce a two-view adaptive regression model which explores the intrinsic correlation between the visual and textual cues of the videos to learn reliable classifiers. Extensive experiments on two real-world video datasets show the effectiveness of our proposed framework and prove that the event knowledge base indeed helps improve the performance of web video event recognition.

PMID: 27705859 [PubMed - as supplied by publisher]

Categories: Literature Watch

A Real-Time Web of Things Framework with Customizable Openness Considering Legacy Devices.

Sat, 2016-10-01 07:30

A Real-Time Web of Things Framework with Customizable Openness Considering Legacy Devices.

Sensors (Basel). 2016;16(10)

Authors: Zhao S, Yu L, Cheng B

Abstract
With the development of the Internet of Things (IoT), resources and applications based on it have emerged on a large scale. However, most efforts are "silo" solutions where devices and applications are tightly coupled. Infrastructures are needed to connect sensors to the Internet, open up and break the current application silos and move to a horizontal application mode. Based on the concept of Web of Things (WoT), many infrastructures have been proposed to integrate the physical world with the Web. However, issues such as no real-time guarantee, lack of fine-grained control of data, and the absence of explicit solutions for integrating heterogeneous legacy devices, hinder their widespread and practical use. To address these issues, this paper proposes a WoT resource framework that provides the infrastructures for the customizable openness and sharing of users' data and resources under the premise of ensuring the real-time behavior of their own applications. The proposed framework is validated by actual systems and experimental evaluations.

PMID: 27690038 [PubMed - as supplied by publisher]

Categories: Literature Watch

Standardized data collection to build prediction models in oncology: a prototype for rectal cancer.

Sat, 2016-10-01 07:30
Related Articles

Standardized data collection to build prediction models in oncology: a prototype for rectal cancer.

Future Oncol. 2016 Jan;12(1):119-36

Authors: Meldolesi E, van Soest J, Damiani A, Dekker A, Alitto AR, Campitelli M, Dinapoli N, Gatta R, Gambacorta MA, Lanzotti V, Lambin P, Valentini V

Abstract
The advances in diagnostic and treatment technology are responsible for a remarkable transformation in the internal medicine concept with the establishment of a new idea of personalized medicine. Inter- and intra-patient tumor heterogeneity and the clinical outcome and/or treatment's toxicity's complexity, justify the effort to develop predictive models from decision support systems. However, the number of evaluated variables coming from multiple disciplines: oncology, computer science, bioinformatics, statistics, genomics, imaging, among others could be very large thus making traditional statistical analysis difficult to exploit. Automated data-mining processes and machine learning approaches can be a solution to organize the massive amount of data, trying to unravel important interaction. The purpose of this paper is to describe the strategy to collect and analyze data properly for decision support and introduce the concept of an 'umbrella protocol' within the framework of 'rapid learning healthcare'.

PMID: 26674745 [PubMed - indexed for MEDLINE]

Categories: Literature Watch

A-DaGO-Fun: an adaptable Gene Ontology semantic similarity-based functional analysis tool.

Fri, 2016-09-30 07:02
Related Articles

A-DaGO-Fun: an adaptable Gene Ontology semantic similarity-based functional analysis tool.

Bioinformatics. 2016 Feb 1;32(3):477-9

Authors: Mazandu GK, Chimusa ER, Mbiyavanga M, Mulder NJ

Abstract
SUMMARY: Gene Ontology (GO) semantic similarity measures are being used for biological knowledge discovery based on GO annotations by integrating biological information contained in the GO structure into data analyses. To empower users to quickly compute, manipulate and explore these measures, we introduce A-DaGO-Fun (ADaptable Gene Ontology semantic similarity-based Functional analysis). It is a portable software package integrating all known GO information content-based semantic similarity measures and relevant biological applications associated with these measures. A-DaGO-Fun has the advantage not only of handling datasets from the current high-throughput genome-wide applications, but also allowing users to choose the most relevant semantic similarity approach for their biological applications and to adapt a given module to their needs.
AVAILABILITY AND IMPLEMENTATION: A-DaGO-Fun is freely available to the research community at http://web.cbio.uct.ac.za/ITGOM/adagofun. It is implemented in Linux using Python under free software (GNU General Public Licence).
CONTACT: gmazandu@cbio.uct.ac.za or Nicola.Mulder@uct.ac.za
SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

PMID: 26476781 [PubMed - indexed for MEDLINE]

Categories: Literature Watch

Pages