Frage

I have some functions, part of a big analysis software, that require a boolean mask to divide array items in two groups. These functions are like this:

def process(data, a_mask):
    b_mask = -a_mask
    res_a = func_a(data[a_mask])
    res_b = func_b(data[b_mask])
    return res_a, res_b

Now, I need to use these functions (with no modification) with a big array that has items of only class "a", but I would like to save RAM and do not pass a boolean mask with all True. For example I could pass a slice like slice(None, None).

The problem is that the line b_mask = -a_mask will fail if a_mask is a slice. Ideally -a_mask should give a 0-items selection.

I was thinking of creating a "modified" slice object that implements the __neg__() method as a null slice (for example slice(0, 0)). I don't know if this is possible.

Other solutions that allow to don't modify the process() function but at the same time avoid allocating an all-True boolean array will be accepted as well.

War es hilfreich?

Lösung

Unfortunately we can't add a __neg__() method to slice, since it cannot be subclassed. However, tuple can be subclassed, and we can use it to hold a single slice object.

This leads me to a very, very nasty hack which should just about work for you:

class NegTuple(tuple):
    def __neg__(self):
        return slice(0)

We can create a NegTuple containing a single slice object:

nt = NegTuple((slice(None),))

This can be used as an index, and negating it will yield an empty slice resulting in a 0-length array being indexed:

a = np.arange(5)
print a[nt]
# [0 1 2 3 4]
print a[-nt]
# []

You would have to be very desperate to resort to something like this, though. Is it totally out of the question to modify process like this?

def process(data, a_mask=None):
    if a_mask is None:
        a_mask = slice(None)  # every element
        b_mask = slice(0)     # no elements
    else:
        b_mask = -a_mask
    res_a = func_a(data[a_mask])
    res_b = func_b(data[b_mask])
    return res_a, res_b

This is way more explicit, and should not have any affect on its behavior for your current use cases.

Andere Tipps

Your solution is very similar to a degenerate sparse boolean array, although I don't know of any implementations of the same. My knee-jerk reaction is one of dislike, but if you really can't modify process it's probably the best way.

If you are concerned about memory use, then advanced indexing may be a bad idea. From the docs

Advanced indexing always returns a copy of the data (contrast with basic slicing that returns a view).

As it stands, the process function has:

  • data of size n say
  • a_mask of size n (assuming advanced indexing)

And creates:

  • b_mask of size n
  • data[a_mask] of size m say
  • data[b_mask] of size n - m

This is effectively 4 arrays of size n.

Basic slicing seems to be your best option then, however Python doesn't seem to allow subclassing slice:

TypeError: Error when calling the metaclass bases
    type 'slice' is not an acceptable base type

See @ali_m's answer for a solution that incorporates slicing.

Alternatively, you could just bypass process and get your results as

result = func_a(data), func_b([])
Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top