문제

I have a database which has very similar rows within the same table. Those rows are similar because they have nearly equal column values. I need to integrate those corresponding rows into one single row.

For example, those two users (u1 and u2) should be integrated:

 u1 = User(name = "William Henry Gates III",
           age = 55,
           nationality = "american",
           alma_mater = "Harvard Univesity")

 u2 = User(name: "William Henry 'Bill' Gates III",
           age: 55,
           nationality: "America",
           alma_mater: "Harvard U.")

I am thinking of using some edit distance and stemming techniques. Other algorithms and techniques suggestions? Any helpful libraries to use (preferably in Python or Java)?

도움이 되었습니까?

해결책

Considered something like Refine?

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top