Help with and possible bug in BooleanIndexing

I’m looking to replicate some behavior in NumPy, namely slicing/indexing with booleans:

import numpy as np
a = np.arange(10)
mask = a < 5

# array([0, 1, 2, 3, 4])

I have found and played with the BooleanIndexing package and am close to a solution. The application is taking an integer list of cluster assignments and building the clusters from them. Here’s a stripped down example:

        int[] assignments = {0,0,0,1,0,2,2};
        int[] indexes     = {0,1,2,3,4,5,7};
        List<int[]> clusters = new ArrayList<>();

        INDArray asarray = Nd4j.createFromArray(assignments);
        INDArray idxarray = Nd4j.createFromArray(indexes);

        int i = 1; 
        INDArray mask = asarray.match(i, Conditions.equals());
        INDArray cluster = BooleanIndexing.applyMask(idxarray, mask);

However, this results in the following clusters like the following:

[0, 1, 2, 0, 4, 0, 0, 0, 0, 0, ...

There are a couple of issues here, one of which might be a bug:

  1. The first data point is assigned to all clusters because its index is zero.
  2. Lots of extraneous zeros.
  3. The mask is incorrect after i=0 (maybe bug?):

For example, the code (the mask) above yields the result [ false, false, false, true, false, true, true]. Notice that the last two booleans are true, which implies that (2 == 1) is true.

So I’m at a bit of a loss as to how to continue, and whether or not this is a bug I should report. Is there a method like .getWhere() that takes a boolean array and returns only those elements that correspond to true values?

@wcneill go ahead and report an issue and then make sure there’s a clear test case (eg: this is fine) and we’ll get this fixed next week and you can then try it in snapshots.

1 Like