Create some test data
In [66]: def mklbl(prefix,n):
....: return ["%s%s" % (prefix,i) for i in range(n)]
....:
In [67]: mi_total = pd.MultiIndex.from_product([mklbl('A',1000),mklbl('B',200)])
# note that these are random consecutive slices; that's just for illustration
In [68]: ms = [ pd.Series(1,index=mi_total.take(np.arange(50000)+np.random.randint(0,150000,size=1))) for i in range(20) ]
In [69]: ms[0]
Out[69]:
A417 B112 1
B113 1
B114 1
B115 1
B116 1
B117 1
B118 1
B119 1
B120 1
B121 1
B122 1
B123 1
B124 1
B125 1
B126 1
...
A667 B97 1
B98 1
B99 1
B100 1
B101 1
B102 1
B103 1
B104 1
B105 1
B106 1
B107 1
B108 1
B109 1
B110 1
B111 1
Length: 50000, dtype: int64
Shove everything into a really long series, convert to a frame (with the same index, which is duplicated at this point), then sum up on the index levels (which de-duplicates)
This is equivalent to a concat(ms).groupby(level=[0,1]).sum()
. (the sort
at the end is just for illustration and not necessary). though you prob want to sortlevel()
to sort the index if you are doing any types of indexing after.
In [103]: concat(ms).to_frame(name='value').sum(level=[0,1]).sort('value',ascending=False)
Out[103]:
value
A596 B109 14
A598 B120 14
B108 14
B109 14
B11 14
B110 14
B111 14
B112 14
B113 14
B114 14
B115 14
B116 14
B117 14
B118 14
B119 14
B12 14
B121 14
B106 14
B122 14
B123 14
B124 14
B125 14
B126 14
B127 14
B128 14
B129 14
B13 14
B130 14
B131 14
B132 14
B133 14
B134 14
B107 14
B105 14
B136 14
A597 B91 14
B79 14
B8 14
B80 14
B81 14
B82 14
B83 14
B84 14
B85 14
B86 14
B87 14
B88 14
B89 14
B9 14
B90 14
B92 14
A598 B104 14
A597 B93 14
B94 14
B95 14
B96 14
B97 14
B98 14
B99 14
A598 B0 14
...
[180558 rows x 1 columns]
Pretty fast now
In [104]: %timeit concat(ms).to_frame(name='value').sum(level=[0,1]).sort('value',ascending=False)
1 loops, best of 3: 342 ms per loop