Data integration problem - How to integrate similar entities
-
13-10-2019 - |
سؤال
I have a database which has very similar rows within the same table. Those rows are similar because they have nearly equal column values. I need to integrate those corresponding rows into one single row.
For example, those two users (u1 and u2) should be integrated:
u1 = User(name = "William Henry Gates III",
age = 55,
nationality = "american",
alma_mater = "Harvard Univesity")
u2 = User(name: "William Henry 'Bill' Gates III",
age: 55,
nationality: "America",
alma_mater: "Harvard U.")
I am thinking of using some edit distance and stemming techniques. Other algorithms and techniques suggestions? Any helpful libraries to use (preferably in Python or Java)?
المحلول
Considered something like Refine?
لا تنتمي إلى StackOverflow