In data integration, entity resolution is an important technique to improve data quality. Existing researches typically assume that the target dataset only contain string-type data and use single similarity metric. For larger high-dimensional dataset, redundant information needs to be verified using traditional blocking or windowing techniques. In this work, we propose a novel ER-resolving method using a hybrid approach, including type-based multiblocks, varying window size, and more flexible similarity metrics. In our new ER workflow, we reduce the searching space for entity pairs by the constraint of redundant attributes and matching likelihood. We develop a reference implementation of our proposed approach and validate its performance using real-life dataset from one Internet of Things project. We evaluate the data processing system using five standard metrics including effectiveness, efficiency, accuracy, recall, and precision. Experimental results indicate that the proposed approach could be a promising alternative for entity resolution and could be feasibly applied in real-world data cleaning for large datasets.
from #AlexandrosSfakianakis via Alexandros G.Sfakianakis on Inoreader http://ift.tt/2Gq3pCS
via IFTTT
Εγγραφή σε:
Σχόλια ανάρτησης (Atom)
Δημοφιλείς αναρτήσεις
-
Music: Elton John: Lyrics: Bernie Taupin: piano and vocals: Elton John: drums: Barry Morgan: bass guitar: Dave Richmond: acoustic guitar: Fr...
-
Information on properly formatting papers and citing sources in several different styles. How to cite legal material in APA style from #Al...
-
Find A+ essays, research papers, book notes, course notes and writing tips. Millions of students use StudyMode to jumpstart their assignment...
-
from #AlexandrosSfakianakis via Alexandros G.Sfakianakis on Inoreader http://ift.tt/2peztQn via IFTTT
-
from #AlexandrosSfakianakis via Alexandros G.Sfakianakis on Inoreader http://ift.tt/2perfrQ via IFTTT
-
Sample Literary Essay #1 . A Literary Essay About “Eleven” by Sandra Cisneros . Children are often intimidated and fall silent when in the c...
-
Looking for the best colleges offering Creative Writing Degrees? Visit StartClass to compare colleges based on tuition, SAT scores, acceptan...
-
This simulation shows a single mass on a spring, b = damping constant (friction) A spring generates a force Runge-Kutta method for numerical...
-
Create terrific lightbox jQuery slideshows in second without a line of code. All browsers and devices! from #AlexandrosSfakianakis via Ale...
Δεν υπάρχουν σχόλια:
Δημοσίευση σχολίου