In data integration, entity resolution is an important technique to improve data quality. Existing researches typically assume that the target dataset only contain string-type data and use single similarity metric. For larger high-dimensional dataset, redundant information needs to be verified using traditional blocking or windowing techniques. In this work, we propose a novel ER-resolving method using a hybrid approach, including type-based multiblocks, varying window size, and more flexible similarity metrics. In our new ER workflow, we reduce the searching space for entity pairs by the constraint of redundant attributes and matching likelihood. We develop a reference implementation of our proposed approach and validate its performance using real-life dataset from one Internet of Things project. We evaluate the data processing system using five standard metrics including effectiveness, efficiency, accuracy, recall, and precision. Experimental results indicate that the proposed approach could be a promising alternative for entity resolution and could be feasibly applied in real-world data cleaning for large datasets.
from #AlexandrosSfakianakis via Alexandros G.Sfakianakis on Inoreader http://ift.tt/2Gq3pCS
via IFTTT
Εγγραφή σε:
Σχόλια ανάρτησης (Atom)
Δημοφιλείς αναρτήσεις
-
IZE is a professional association dedicated to expanding the educational impact of zoos and aquariums worldwide, to enhance the understandin...
-
Copyright © 1999-2007 by , Kai Froeb. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free ...
-
A Vietnam War Timeline [Note: This timeline is an abbreviated version of the more detailed timeline posted on the Public Broadcasting System...
-
from #AlexandrosSfakianakis via Alexandros G.Sfakianakis on Inoreader http://ift.tt/2nhPCs5 via IFTTT
-
Greens Blue Flame supplies propane tank installation services and propane delivery in the Houston, TX area. We also offer bulk commercial de...
-
4995879043233 Swing Low, Staple Singers, Stapsingers 9780340891070 0340891076 Hod Cat - Sceptre Catalogue Jul 9781560630821 1560630825 Babil...
-
Disclaimer: All personages on drawings over 18 age. -high- has a zero-tolerance policy against illegal pornography. All content and links ar...
-
1,001 FREE cover letter examples and samples for consultants, career changers, and job hunters. The FIRST and BEST cover letters on the Inte...
-
The National Booster Club Training Council, Providing Guidance, Education, Training and Support from #AlexandrosSfakianakis via Alexandros...
Δεν υπάρχουν σχόλια:
Δημοσίευση σχολίου