Find rows repeated in no particular order and show result sorted by number of repetitions
-
26-12-2019 - |
Pregunta
I was able to solve this problem using Python but I would need to implement the solution in Excel itself as well so that I can use graphics to represent the results easily.
Given this table:
b a c
c a b
a c b
a c
a c d
b c a
d c a
I would like to obtain a list sorted by the number of times that a row is the repeated (in no particular order) in the table.
- So this would count as repeated rows: "a b c", "c b a", "a c b"
- But this wouldn't: "a b c", "b c", "b", "a b", "a c"
So, the output I'm looking for would be something like:
1st place: "b+a+c" found 4 times
2nd place: "a+c+d" found 2 twice
3rd place: "a+c" found once
The output has to say "b+a+c" even if it is also counting "a+b+c", "c+b+a" and so on... because "b+a+c" was the first one of all the other subsequent repetitions.
Would anyone be able to show me the correct way to approach the problem?
Solución
I would use a Class module and a collection object. The class module would consist of two arrays and a counter. The first array is the row in its original order; the second array is the row in sorted order. The sorted order would be used as the Key for the collection object. If you try to add a collection object where the Key already exists, it will cause an error. Trap the error and add one to the counter.
Then for the results, you would retrieve the original entries from the "original" array; and the counter. Sort on the counter and you have your results.
Here is an example of VBA code to accomplish the above.
First, insert a Class module and rename it RowEntries
Option Explicit
Private pOriginal() As Variant
Private pSorted() As Variant
Private pCount As Long
Public Property Get Original() As Variant
Original = pOriginal
End Property
Public Property Let Original(Value As Variant)
pOriginal = Value
End Property
Public Property Get Sorted() As Variant
Sorted = pSorted
End Property
Public Property Let Sorted(Value As Variant)
pSorted = Value
End Property
Public Property Get Count() As Long
Count = pCount
End Property
Public Property Let Count(Value As Long)
pCount = Value
End Property
Then insert a regular module. This code assumes your source data is the CurrentRegion around A1; and the results will go several columns to the right. These algorithms are easily changed.
Option Explicit
Option Compare Text 'To make comparison case insensitive, if you want
Sub RankRows()
Dim V As Variant, VtoSort As Variant
Dim vRes() As Variant
Dim cRowEntries As RowEntries
Dim colRowEntries As Collection
Dim sKey As String, S As String
Dim I As Long
Dim rSrc As Range, rRes As Range 'Location for Results
Set rSrc = Range("A1").CurrentRegion
Set rRes = rSrc.Offset(columnoffset:=rSrc.Columns.Count + 3).Resize(1, 2)
V = rSrc
Set colRowEntries = New Collection
On Error Resume Next
For I = 1 To UBound(V)
Set cRowEntries = New RowEntries
With cRowEntries
.Original = WorksheetFunction.Index(V, I, 0)
VtoSort = .Original
Quick_Sort VtoSort, LBound(VtoSort), UBound(VtoSort)
.Sorted = VtoSort
.Count = 1
sKey = CStr(Join(.Sorted, ", "))
colRowEntries.Add cRowEntries, sKey
If Err.Number <> 0 Then
Err.Clear
With colRowEntries(sKey)
.Count = .Count + 1
End With
End If
End With
Next I
On Error GoTo 0
'populate results array
ReDim vRes(1 To colRowEntries.Count, 1 To 2)
For I = 1 To colRowEntries.Count
With colRowEntries(I)
vRes(I, 1) = Join(.Original, "+")
'remove trailing delimiters
Do While Right(vRes(I, 1), 1) = "+"
vRes(I, 1) = Left(vRes(I, 1), Len(vRes(I, 1)) - 1)
Loop
vRes(I, 2) = .Count
End With
Next I
Set rRes = rRes.Resize(rowsize:=UBound(vRes), columnsize:=UBound(vRes, 2))
With rRes
.EntireColumn.Clear
.Value = vRes
.Sort key1:=rRes.Columns(2), order1:=xlDescending, Header:=xlNo
End With
V = rRes
ReDim vRes(1 To UBound(V), 1 To 1)
For I = 1 To UBound(V)
Select Case V(I, 2)
Case 1
S = "once"
Case 2
S = "twice"
Case Else
S = V(I, 2) & " times"
End Select
vRes(I, 1) = OrdinalNum(I) & " place: """ & V(I, 1) & """ found " & S
Next I
rRes.EntireColumn.Clear
rRes.Resize(columnsize:=1) = vRes
rRes.EntireColumn.AutoFit
End Sub
Sub Quick_Sort(ByRef SortArray As Variant, ByVal first As Long, ByVal last As Long)
Dim Low As Long, High As Long
Dim Temp As Variant, List_Separator As Variant
Low = first
High = last
List_Separator = SortArray((first + last) / 2)
Do
Do While (SortArray(Low) < List_Separator)
Low = Low + 1
Loop
Do While (SortArray(High) > List_Separator)
High = High - 1
Loop
If (Low <= High) Then
Temp = SortArray(Low)
SortArray(Low) = SortArray(High)
SortArray(High) = Temp
Low = Low + 1
High = High - 1
End If
Loop While (Low <= High)
If (first < High) Then Quick_Sort SortArray, first, High
If (Low < last) Then Quick_Sort SortArray, Low, last
End Sub
Function OrdinalNum(num) As String
Dim Suffix As String
OrdinalNum = num
If Not IsNumeric(num) Then Exit Function
If num <> Int(num) Then Exit Function
Select Case num Mod 10
Case Is = 1
Suffix = "st"
Case Is = 2
Suffix = "nd"
Case Is = 3
Suffix = "rd"
Case Else
Suffix = "th"
End Select
Select Case num Mod 100
Case 11 To 19
Suffix = "th"
End Select
OrdinalNum = Format(num, "#,##0") & Suffix
End Function
The output will be just as you show in your request above. But could be easily modified:
Otros consejos
I suggest you an alternate way to solve this problem.
You can transfer a b c d to 1 2 4 8 (in binary is 01 10 100 1000).
a+b+c = a+c+b =... = 7 (111)
a+c = c+a = 5 (101)
So you can use the sum value to group by in excel.
The function to transfer single char to digital is very simple:
A B C POWER(2,CODE(A2) - 97) POWER(2,CODE(A2) - 97) POWER(2,CODE(A2) - 97) SUM(D2:F2)
-+-+-+-----------------------+-----------------------+-----------------------+----------
b|a|c|2 |1 |4 |7
c|a|b|4 |1 |2 |7
a|c|b|1 |4 |2 |7
a|c| |1 |4 |0 |5
a|c|d|1 |4 |8 |13
b|c|a|2 |4 |1 |7
d|c|a|8 |4 |1 |13
Wish this method can help you to find your own way to solve your problem.
This question is so interesting. It's a good sample for showing how to use mathematics to provide a simpler solution.
I had to add another answer because I realized to find repeat combinations of three words is the same as to calculate the distance in three-space from the zero point - it is only necessary to give each word a different number. And this answer can handle the a+a problem that Pnuts mentioned before.
Different from my last answer, if you have 200 phrases and combinations within three members, the biggest number calculated is 120000 (POWER(200,2)*3), my last answer is 1.60694E+60 (POWER(2,200)). My last answer may solve the problem logically, but cannot be implemented in Excel or many programming languages. It uses a permutations solution to solve a combinations problem.
Here is the solution using distance in three-space, it's simple and easily extendable.
- Map each word to a different number. (VLOOKUP is one method, you may have alternative ways.) The resulting numbers don't need to be continuous, only different from one another, and the maximum number should be less than SQRT(POWER(2,32)/3)).
- Calculate the distance using the formula in G1.
- Group and Count use column G. (There may ways you could find in other answers.)
- Notice: I use '_' replaced space cell, to map a number for space, so you can make a_a equal to aa_ (line 4 and 5). Any choice should have a number for space.
Any advice to improve this answer would be appreciated.
Almost a formula only solution, assuming data is in labelled ColumnsA:C, in D2:
=VLOOKUP(A2,weight,2,0)+IFNA(VLOOKUP(B2,weight,2,0),)+IFNA(VLOOKUP(C2,weight,2,0),)
copied down to suit, where weight
(green in the image) is a named range for a lookup table (constructed along the lines suggested by @Jaugar Chang). In E2 and copied down to suit:
=IF(COUNTIF(D$2:D2,D2)=1,COUNTIF(D:D,D2),"")
in G1:
=ROW()&MID("thstndrdthstndrdth",MATCH(IF(MOD(ROW(),100)>29,MOD(ROW(),10)+20,MOD(ROW(),100)),{0,1,2,3,4,21,22,23,24},1)*2-1,2)&" place: """&INDIRECT("A"&MATCH(H1,E:E,0))&"+"&INDIRECT("B"&MATCH(H1,E:E,0))&"+"&INDIRECT("C"&MATCH(H1,E:E,0))&""" found"
in H1:
=LARGE(E:E,ROW())
in I1:
=IF(H1>2,"times",IF(H1=1,"","twice"))
Each of the last three copied down until just short of an error message.
ColumnH formatted:
[=1] "once";General
Output is highlighted yellow:
In this example there is a surplus +
and the possibility of ++
surplus.
The way I would do this is using a dictionary to go through the list and count the rows. The key would be the row itself, so I could use the Dictionary.Exists(Key)
method of the dictionary to see if I have already encountered that row. The value associated to each key would be an integer which I'd increment every time I come across the same row again.
After parsing the list I would iterate the dictionary to output they key and value to a column in excel. Finally I'd use sort on the range in which I output the results to sort them by frequency.
This is pretty easy stuff, but you need to reference the Microsoft Scripting Runtime
to use the dictionary object (see here for example http://www.techbookreport.com/tutorials/vba_dictionary.html).
Hope this helps.
UPDATE
Since you said you might give this method a try in vba. I'd thought I'd add something the always tripped me up when I first used the Collection
and Dictionary
objects. When iterating through the entries, the iteration variable has to be a Variant
. I was used to having to declare the iteration variable of the same type as that of the data I was iterating, but that will give you an error in vba.
Here's my version using array manipulation, then some range manipulation.
Edit1: I've read pnut's comment about handling b only. Btw, this will not handle a+a
Sub Test()
Dim arr, unq
Dim orng As Range, rng As Range, srng As Range
Dim i As Long, k As Long
Dim check As Boolean: check = False
Dim freq As String
'~~> pass range data to array
Set orng = Sheet1.Range("A1", _
Sheet1.Range("A" & Sheet1.Rows.Count).End(xlUp))
For Each rng In orng
If Not IsArray(arr) Then
arr = Array(RngToArr(rng.Resize(, 3)))
Else
ReDim Preserve arr(UBound(arr) + 1)
arr(UBound(arr)) = RngToArr(rng.Resize(, 3))
End If
Next
'~~> pass unique combination and count to another array
For i = LBound(arr) To UBound(arr)
If IsEmpty(unq) Then
ReDim unq(1 To 2, 1 To 1)
unq(1, 1) = arr(i)
unq(2, 1) = unq(2, 1) + 1
Else
For k = LBound(unq, 2) To UBound(unq, 2)
If CompArr(arr(i), unq(1, k)) Then
check = False
unq(2, k) = unq(2, k) + 1
Exit For
Else
check = True
End If
Next
If check Then
ReDim Preserve unq(1 To 2, 1 To UBound(unq, 2) + 1)
unq(1, UBound(unq, 2)) = arr(i)
unq(2, UBound(unq, 2)) = unq(2, UBound(unq, 2)) + 1
End If
End If
Next
'~~> Transpose and tidy up the array
ReDim tally(1 To UBound(unq, 2), 1 To 2)
For i = LBound(unq, 2) To UBound(unq, 2)
tally(i, 1) = Join$(unq(1, i), "+")
tally(i, 2) = unq(2, i)
Next
'~~> sort in worksheet, easier than sorting array
With Sheet1
Set srng = .Range("E1:F" & UBound(tally, 1))
srng = tally
.Sort.SortFields.Clear
.Sort.SortFields.Add Key:=srng.Offset(0, 1).Resize(, 1), _
SortOn:=xlSortOnValues, Order:=xlDescending, _
DataOption:=xlSortNormal
With .Sort
.SetRange srng
.Header = xlGuess
.Orientation = xlTopToBottom
.SortMethod = xlPinYin
.Apply
End With
End With
'~~> do some manipulation to make it closer to what you want
For Each rng In srng.Offset(0, 1).Resize(, 1)
Select Case rng.Value
Case 1: freq = "found once"
Case 2: freq = "found twice"
Case Else: freq = "found " & rng.Value & " times"
End Select
rng.Value = freq
Next
End Sub
Private Function CompArr(list1, list2) As Boolean
Dim j As Long: CompArr = True
For j = LBound(list1) To UBound(list1)
With Application
If IsError(.Match(list1(j), list2, 0)) _
Then CompArr = False
End With
Next
End Function
Private Function RngToArr(r As Range) As Variant
Dim c As Range, a
For Each c In r
If Len(c.Value) <> 0 Then
If Not IsArray(a) Then
a = Array(c.Value)
Else
ReDim Preserve a(UBound(a) + 1)
a(UBound(a)) = c.Value
End If
End If
Next
RngToArr = a
End Function
Result:
Not exactly the way you want it, I was not able to come-up how to set up 1st Place, 2nd Place, etc. dynamically.
Also, I did not go deep on the plus(+) sign. If there are blanks, result maybe +b+c, or a+c+ or a++c.
Anyways, HTH.