Personal tools

PACBB'2017 - Sérgio Matos


Jump to: navigation, search

Date 2017/06/21
Title Improving Document Prioritization for Protein-Protein Interaction Extraction Using Shallow Linguistics and Word Embeddings
Speaker Sérgio Matos
Event PACBB'2017
Location Porto
Country Portugal

Abstract: Understanding of biological processes, associated to disease or pharmacological action for example, requires the analysis of large amounts of interconnected information. Protein interaction networks form part of this puzzle, and extracting this information from the scientific literature is an important but challenging task. In this work, we present a supervised classification approach for identifying and ranking literature documents that contain information regarding protein interactions. We studied the use of word embedding together with simple chunking features, and show that the combination of these features with baseline bag-of-words can lead to similar or even improved results when compared to the use of features based on deep linguistic parsing. When applied to the BioCreative III Article Classification Task dataset, our approach achieves an area under the precision-recall curve of 0.70 and a Matthew’s correlation coefficient of 0.56.