Named Entity Recognition (NER) from text constitutes the first step in many text mining applications. The most important preliminary step for NER systems using machine learning approaches is tokenization where raw text is segmented into tokens. This study proposes an enhanced rule based tokenizer, ChemTok, which utilizes rules extracted mainly from the train data set. The main novelty of ChemTok is the use of the extracted rules in order to merge the tokens split in the previous steps, thus producing longer and more discriminative tokens. ChemTok is compared to the tokenization methods utilized by ChemSpot and tmChem. Support Vector Machines and Conditional Random Fields are employed as the learning algorithms. The experimental results show that the classifiers trained on the output of ChemTok outperforms all classifiers trained on the output of the other two tokenizers in terms of classification performance, and the number of incorrectly segmented entities.
from #AlexandrosSfakianakis via Alexandros G.Sfakianakis on Inoreader http://ift.tt/1ZUIppl
via IFTTT
Εγγραφή σε:
Σχόλια ανάρτησης (Atom)
Δημοφιλείς αναρτήσεις
-
Abstract Purpose To test the effects of 4 weeks of unilateral low-load resistance training (LLRT), with and without blood flow restricti...
-
36 new pubmed citations were retrieved for your search. Click on the search hyperlink below to display the complete search results: quality...
-
The genital mucosa is a barrier that is constantly exposed to a variety of pathogens, allergens, and external stimuli. Although both allerge...
-
by Mark A. Valasek, Irene Thung, Esha Gollapalle, Alexey A. Hodkoff, Kaitlyn J. Kelly, Joel M. Baumgartner, Vera Vavinskaya, Grace Y. Lin, A...
-
The receptor tyrosine kinase KIT is an established oncogenic driver of tumor growth in certain tumor types, including gastrointestinal strom...
-
The main idea behind this work was demonstrated in a form of a new thermoelectrochromic sensor on a flexible substrate using graphene as an ...
-
Abstract There are limited published data on the burden of rare cancers in the United States. By using data from the North American Associ...
-
from #AlexandrosSfakianakis via Alexandros G.Sfakianakis on Inoreader http://ift.tt/2f9YA71 via IFTTT
-
Relativistic hydrodynamics has been quite successful in explaining the collective behaviour of the QCD matter produced in high energy heavy-...
Δεν υπάρχουν σχόλια:
Δημοσίευση σχολίου