I’m looking to replicate some behavior in NumPy, namely slicing/indexing with booleans:
import numpy as np
a = np.arange(10)
mask = a < 5
a[mask]
# array([0, 1, 2, 3, 4])
I have found and played with the BooleanIndexing
package and am close to a solution. The application is taking an integer list of cluster assignments and building the clusters from them. Here’s a stripped down example:
int[] assignments = {0,0,0,1,0,2,2};
int[] indexes = {0,1,2,3,4,5,7};
List<int[]> clusters = new ArrayList<>();
INDArray asarray = Nd4j.createFromArray(assignments);
INDArray idxarray = Nd4j.createFromArray(indexes);
int i = 1;
INDArray mask = asarray.match(i, Conditions.equals());
INDArray cluster = BooleanIndexing.applyMask(idxarray, mask);
clusters.add(cluster.toIntVector());
However, this results in the following clusters like the following:
[0, 1, 2, 0, 4, 0, 0, 0, 0, 0, ...
There are a couple of issues here, one of which might be a bug:
- The first data point is assigned to all clusters because its index is zero.
- Lots of extraneous zeros.
- The mask is incorrect after i=0 (maybe bug?):
For example, the code (the mask) above yields the result [ false, false, false, true, false, true, true]
. Notice that the last two booleans are true, which implies that (2 == 1) is true.
So I’m at a bit of a loss as to how to continue, and whether or not this is a bug I should report. Is there a method like .getWhere()
that takes a boolean array and returns only those elements that correspond to true
values?