How to implement a prime guesser?

Hi, I was sent here from StackOverflow.

I’m trying to implement the neural network described in this paper: https://journals.sdu.edu.kz/index.php/ais/article/view/410

I looked at the official examples and tried to adapt the AdditionModelWithSeq2Seq to match my requirements, but I’m neither sure how to read in my data-file correctly, nor do I know if the computation-graph or the multilayer-network is the right approach to use here.

My inputs are 32Bit numbers (n) that are the product of 2 multiplied primes (p,q). The input numbers should be used as binary vectors and the network should output p.

The network described in the paper looks like this:
(LSTM
Batch Normalization
Dropout) x 3
(Dense
Batch Normalization
Dropout) x 2
Dense

My current approach for the computation graph looks like this:

    ArrayList<String> inAndOutNames = new ArrayList<>();
    String[] inputNames = new String[inputAmount];
    InputType[] inputTypes = new InputType[inputAmount + 1];
    for(int i = 0; i < inputAmount; i++)
    {
        inAndOutNames.add("bit" + i);
        inputNames[i] = "bit" + i;
        inputTypes[i] = InputType.recurrent(1);
    }
    inAndOutNames.add("p");
    inputTypes[inputAmount] = InputType.recurrent(1);

    ComputationGraphConfiguration configuration = new NeuralNetConfiguration.Builder()
            .weightInit(WeightInit.XAVIER)
            .updater(new Adam(0.001))
            .seed(seed)
            .graphBuilder()
            .addInputs(inAndOutNames)
            .setInputTypes(inputTypes)
            .addLayer("l0", new DenseLayer.Builder().nIn(inputAmount).nOut(inputAmount).build(), inputNames)
            .addLayer("l1", new LSTM.Builder().nIn(inputAmount).nOut(128).activation(Activation.TANH).build(), "l0")
            .addLayer("l2", new LSTM.Builder().nIn(128).nOut(256).build(), "l1")
            .addLayer("l3", new DenseLayer.Builder().nIn(256).nOut(256).build(), "l2", "p")
            .addLayer("lOut", new DenseLayer.Builder().nIn(256).nOut(10).build(), "l3")
            .setOutputs("lOut")
            .build();

    model = new ComputationGraph(configuration);

This is my attempt to read in my test data:

        RecordReader recordReader;
        recordReader = new CSVRecordReader(0, ",");
        recordReader.initialize(new FileSplit(new File("src/main/resources/datasets/32-Bit x 10000 RSA Bit-Combinations (2023-03-20-12-45-16)")));

        dl.setInputAmount(bits);

        MultiDataSetIterator dataSetIterator = new NewCustomPrimeIterator(bits);
        dataSetIterator = new RecordReaderMultiDataSetIterator.Builder(100)
                .addReader("reader", recordReader)
                .addInput("reader", 0, 31)
                .addOutput("reader", 32, 32)
                .build();

Let’s work step by step here and solve your data loading problem first.

What problem do you run into? Is it simply not the data you expect? Do you have issues even checking how your loaded data looks?

One line of the data I’m using looks like this:

1,1,0,1,0,0,0,0,0,1,1,0,1,1,0,0,0,0,0,1,1,0,1,1,0,0,0,0,1,1,0,1,46271

The first 32 values are the bits of n, the last number is p.

When I run my code, I get the error: “Invalid input array: network has 33 inputs, but array is of length 1”

@Fi0x Did you generate the dataset already?
I see this from the paper:

Layer (type)Output ShapeParam #
LSTM(None, 1, 128)76288
Batch Normalization(None, 1, 128)512
Dropout(None, 1, 128)0
LSTM(None, 1, 256)394240
Batch Normalization(None, 1, 256)1024
Dropout(None, 1, 256)0
LSTM(None, 512)1574912
Batch Normalization(None, 512)2048
Dropout(None, 512)0Dense(None, 128)65664
Batch Normalization(None, 128)512
Dropout(None, 128)0
Dense(None, 100)12900
Batch Normalization(None, 100)400
Dropout(None, 100)0
Dense(None, 10)1010

From the looks of it you’re trying to use multiple inputs. Your graph should be something like this:

ComputationGraphConfiguration.GraphBuilder graphBuilder = new NeuralNetConfiguration.Builder()
                .seed(12345)
                .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
                .updater(new Adam(0.001))
                .weightInit(WeightInit.XAVIER)
                .graphBuilder()
                .addInputs("input")
                .setInputTypes(InputType.recurrent(inputSize));

        graphBuilder
                .addLayer("lstm1", new LSTM.Builder().nOut(128).activation(Activation.TANH).build(), "input")
                .addLayer("batchNorm1", new BatchNormalization.Builder().nOut(128).build(), "lstm1")
                .addLayer("dropout1", new Dropout.Builder(0.5).build(), "batchNorm1")

                .addLayer("lstm2", new LSTM.Builder().nOut(256).activation(Activation.TANH).build(), "dropout1")
                .addLayer("batchNorm2", new BatchNormalization.Builder().nOut(256).build(), "lstm2")
                .addLayer("dropout2", new Dropout.Builder(0.5).build(), "batchNorm2")

                .addLayer("lstm3", new LSTM.Builder().nOut(512).activation(Activation.TANH).build(), "dropout2")
                .addVertex("lastTimeStep", new LastTimeStepVertex("input"), "lstm3")
                .addLayer("batchNorm3", new BatchNormalization.Builder().nOut(512).build(), "lastTimeStep")
                .addLayer("dropout3", new Dropout.Builder(0.5).build(), "batchNorm3")

                .addLayer("dense1", new DenseLayer.Builder().nOut(128).activation(Activation.RELU).build(), "dropout3")
                .addLayer("batchNorm4", new BatchNormalization.Builder().nOut(128).build(), "dense1")
                .addLayer("dropout4", new Dropout.Builder(0.5).build(), "batchNorm4")

                .addLayer("dense2", new DenseLayer.Builder().nOut(100).activation(Activation.RELU).build(), "dropout4")
                .addLayer("batchNorm5", new BatchNormalization.Builder().nOut(100).build(), "dense2")
                .addLayer("dropout5", new Dropout.Builder(0.5).build(), "batchNorm5")

                .addLayer("output", new OutputLayer.Builder().nOut(10).activation(Activation.SOFTMAX).build(), "dropout5")
                .setOutputs("output");

        ComputationGraph graph = new ComputationGraph(graphBuilder.build());

Could you clarify why you didn’t follow the network in the paper like adding batch norm and why you added multiple inputs? It defines the exact network you need. The main issue in the network setup here should probably be the input data.

Note that I’m not clear on the number of initial inputs but just set that to be whatever you need that to be and let setInputTypes do the rest.

I didn’t read the paper but I did skim it to look for at least the network architecture.

Did you try following the paper for the exact details of the data?

Thanks a lot for your effort!

Yes, I already generated the Dataset, but I could change it anytime without a lot of effort if that would help.

The inputs I was trying to use were the individual bits, and the number “p”. I might have implemented that part incorrectly.

I didn’t add the complete network because I was still struggling with the inputs, so I thought I should solve that first. I also don’t know a lot about neural networks and how they are defined, so I didn’t fully understand how to implement the network from the paper correctly.

The dataset I use is slightly different to what the paper uses. In the paper the 32-bit number and “p” are used in binary representation. In my dataset I kept “p” as a base-10 number, but it wouldn’t be a problem to change that.

I now used the network layout you suggested, but now this error occurs: “3D input expected to RNN layer expected, got 2”. Do I have to change the dataset in some way to fix this or can this be done with a change in the MultiDataSetIterator?

@Fi0x sorry forgot to look in to this. Please ping me in the future if you don’t hear from me.

I guess you’ve moved on by now let me at least put some thoughts here for future reference. Your input needs to be reshaped to be 3d if you want 3d. The LSTM at the beginning requires 3d input. You would need the data represented as a sequence. You would need 3d creation of the dataset like this:

Beyond that, this isn’t a multi input problem. You just need a standard single sequence in. The “multi” is for more complicated problems where you start with different paths in your network. For example images and text and you want to learn how to describe the input image.

Hi @agibsonccc , thanks for the reply.

I haven’t found another solution yet, so I’m happy you still replied.

I will implement the modified DataSetReader and let you know if this worked.

@Fi0x great. Please do ping me if you don’t hear from me. Sorry for the delay!

Hi @agibsonccc ,

I tried adjusting the code you recommended, but I’m having trouble with the correct shape.
My code for the DataSetReader currently looks like this:

public class PrimeDataSetReader extends BaseDataSetReader
{
public PrimeDataSetReader(File file)
{
    filePath = file.toPath();
    doInitialize();
}

public DataSet next(int num)
{
    INDArray features = Nd4j.create(new int[]{num, 1, 1}, 'f');
    INDArray labels = Nd4j.create(new int[]{num, 1, 1}, 'f');

    for(int i = 0; i < num && iter.hasNext(); i++)
    {
        String featureStr = iter.next();
        currentCursor++;
        featureStr = featureStr.replaceAll(",", "");
        String[] featureAry = featureStr.split("");
        for(int j = 0; j < featureAry.length - 1; j++)
        {
            int feature = Integer.parseInt(featureAry[j]);
            int label = Integer.parseInt(featureAry[j + 1]);
            features.putScalar(new int[]{i, feature, j}, 1.0);
            labels.putScalar(new int[]{i, label, j}, 1.0);
        }
    }
    return new DataSet(features, labels, null, null);
}
}

I’m using the GraphConfiguration you suggested and the DataSetIterator that was related to the code from the “LotteryPrediction”.
But when I’m running my code, I get this illegal argument exception:

input.size(1) does not match expected input size of 128 - got input array with shape [1, 100, 128, 1]

What do I need to change to fix this?

@Fi0x can you post the full network? I don’t know what you’re using currently.

The configuration:

    ComputationGraphConfiguration configuration = new NeuralNetConfiguration.Builder()
            .seed(seed)
            .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
            .updater(new Adam(0.001))
            .weightInit(WeightInit.XAVIER)
            .graphBuilder()
            .addInputs("input")
            .setInputTypes(InputType.recurrent(inputAmount))
            .addLayer("lstm1", new LSTM.Builder().nOut(128).activation(Activation.TANH).build(), "input")
            .addLayer("batchNorm1", new BatchNormalization.Builder().nOut(128).build(), "lstm1")
            .addLayer("dropout1", new DropoutLayer.Builder(0.5).build(), "batchNorm1")

            .addLayer("lstm2", new LSTM.Builder().nOut(256).activation(Activation.TANH).build(), "dropout1")
            .addLayer("batchNorm2", new BatchNormalization.Builder().nOut(256).build(), "lstm2")
            .addLayer("dropout2", new DropoutLayer.Builder(0.5).build(), "batchNorm2")

            .addLayer("lstm3", new LSTM.Builder().nOut(512).activation(Activation.TANH).build(), "dropout2")
            .addVertex("lastTimeStep", new LastTimeStepVertex("input"), "lstm3")
            .addLayer("batchNorm3", new BatchNormalization.Builder().nOut(512).build(), "lastTimeStep")
            .addLayer("dropout3", new DropoutLayer.Builder(0.5).build(), "batchNorm3")

            .addLayer("dense1", new DenseLayer.Builder().nOut(128).activation(Activation.RELU).build(), "dropout3")
            .addLayer("batchNorm4", new BatchNormalization.Builder().nOut(128).build(), "dense1")
            .addLayer("dropout4", new DropoutLayer.Builder(0.5).build(), "batchNorm4")

            .addLayer("dense2", new DenseLayer.Builder().nOut(100).activation(Activation.RELU).build(), "dropout4")
            .addLayer("batchNorm5", new BatchNormalization.Builder().nOut(100).build(), "dense2")
            .addLayer("dropout5", new DropoutLayer.Builder(0.5).build(), "batchNorm5")

            .addLayer("output", new OutputLayer.Builder().nOut(10).activation(Activation.SOFTMAX).build(), "dropout5")
            .setOutputs("output")
            .build();

The Iterator:

public class PrimeDataSetIterator implements DataSetIterator
{
private final BaseDataSetReader recordReader;
private final int batchSize;
private DataSet last;
private boolean useCurrent;

public PrimeDataSetIterator(String filePath, int batchSize)
{
    this.recordReader = new PrimeDataSetReader(new File(filePath));
    this.batchSize = batchSize;
}

@Override
public DataSet next(int i)
{
    return recordReader.next(i);
}

public int totalExamples()
{
    return recordReader.totalExamples();
}

@Override
public int inputColumns()
{
    if(last == null)
    {
        DataSet next = next();
        last = next;
        useCurrent = true;
        return next.numInputs();
    } else
    {
        return last.numInputs();
    }
}

@Override
public int totalOutcomes()
{
    if(last == null)
    {
        DataSet next = next();
        last = next;
        useCurrent = true;
        return next.numOutcomes();
    } else
    {
        return last.numOutcomes();
    }
}

@Override
public boolean resetSupported()
{
    return true;
}

@Override
public boolean asyncSupported()
{
    return true;
}

@Override
public void reset()
{
    recordReader.reset();
    last = null;
    useCurrent = false;
}

@Override
public int batch()
{
    return batchSize;
}

@Override
public void setPreProcessor(DataSetPreProcessor dataSetPreProcessor)
{

}

@Override
public DataSetPreProcessor getPreProcessor()
{
    throw new UnsupportedOperationException("Not support the function");
}

@Override
public List<String> getLabels()
{
    return null;
}

@Override
public boolean hasNext()
{
    return recordReader.hasNext();
}

@Override
public DataSet next()
{
    if(useCurrent)
    {
        useCurrent = false;
        return last;
    } else
    {
        return next(batchSize);
    }
}
}

The DataSetReader:

public class PrimeDataSetReader extends BaseDataSetReader
{
public PrimeDataSetReader(File file)
{
    filePath = file.toPath();
    doInitialize();
}

public DataSet next(int num)
{
    INDArray features = Nd4j.create(new int[]{num, 1, 1}, 'f');
    INDArray labels = Nd4j.create(new int[]{num, 1, 1}, 'f');

    for(int i = 0; i < num && iter.hasNext(); i++)
    {
        String featureStr = iter.next();
        currentCursor++;
        featureStr = featureStr.replaceAll(",", "");
        String[] featureAry = featureStr.split("");
        for(int j = 0; j < featureAry.length - 1; j++)
        {
            int feature = Integer.parseInt(featureAry[j]);
            int label = Integer.parseInt(featureAry[j + 1]);
            features.putScalar(new int[]{i, feature, j}, 1.0);
            labels.putScalar(new int[]{i, label, j}, 1.0);
        }
    }
    return new DataSet(features, labels, null, null);
}
}

It looks like you’ll want to configure the time series length (1) as well as set the size to 128.
We appear to be throwing an error due to expecting 128 there.

What are your features? We default to NCW for the input format.

Do you know what your number of features is supposed to be in the paper?

Where would I configure the length and the set size?

I’m not sure what you mean by “What are your features?”, I think the features should be the individual bits of the number I’m using.
I also don’t know what the NCW format is.

In the paper they used different datasets with varying bit-sizes. I’m currently using 32-bit numbers, so the feature size would be 32 I suppose. (I’m don’t know a lot about NN, so please correct me if I’m wrong).

@Fi0x Read up on RNNs here:

Again ping me if you don’t hear from me :wink:

NCW is “Number of elements/batch size, Channels (time series length), Width (number of features)”

It’s mainly used in CNNs but can overlap since 1d convolutional nets also process time series.

Based on your response your answer then is “I’m encoding sequences using binary”

Do you have any updated code for me to look at?

Hi @agibsonccc ,

I’m still using the code I posted last.

I already read the documentation you suggested, but I still don’t know where exactly I could fix the time series length and the size.

I’m also very confused by the error, because it shows 4 dimensions and in the documentation I only found 3D data.
The only part I could figure out on my own was that the second number is equal to the batch-size I’m using, but the third seems to be related to it in some-way.
It would be nice to know what the individual parts of the shape in the error represent.

@Fi0x sorry that 4d error confusion is a good point. That comes from the fact that it’s mainly used in CNNs where there’s also a height.

Does the paper describe how to create the dataset at all? (Sorry I don’t have time to read the whole thing)

Usually the time series length would be the number of digits you’d want to learn.
The batch size would be the number of sequences.
The features would be how ever many columns you’d have as observations?

You might want to look at one of our examples like this: deeplearning4j-examples/AdditionModelWithSeq2Seq.java at 686db99fee3d4825ee70663e1a15aa8d6216f2c2 · deeplearning4j/deeplearning4j-examples · GitHub

The way we encode the data for each character is based on the number of characters in a one hot encoding (1 at the index where the character occurs 0 everywhere else) this sounds family similar.

Have you looked at anything like this yet?

The paper doesn’t describe the exact creation of the data. The only information it provides is that the number n (product of primes p and q) is used as input and the smaller prime of p and q is used as label. Both numbers are represented in binary format.
Description from the paper:

image

So the number of digits I want to learn would be the input bit-size, in my case 32, or would the “binary vectors” from the paper be represented as 32 features with a time series length of 1? (I’m not sure how a vector in NNs would be handled correctly)

I looked at the examples already and tried adapting the model you posted. But I might have overlooked something regarding the data formatting. I’ll post my updated code when I’m done with that.

@Fi0x that sounds right.
Raw data would usually be pixels or column in a spreadsheet or sensors. For characters it’s just the onehot. It’s actually a very similar situation here.

The timestep of 1 should be ok. Ping me when you get that done.

Hi @agibsonccc,
I tried to fix the sizes, but I still don’t know how to do that exactly.

However, I’ve found out that the “next()” method is called 4 times before the error occurs (from somewhere in the “fit()” method of the graph), which makes me wonder why the error doesn’t occur the first time?

I was also confused by the numbers in the shape. The first and last always stay at 1 and the second is equal to the batch-size I use. The third is always 128, which might refer to the 4x32?

My code currently looks like this:
Calling all the suff:

    int bits = 32;
    int batchSize = 2;
    String fileName = "src/main/resources/datasets/32-Bit Testset";

    try
    {
        DeepLearningPrimeFactorizer dl = new DeepLearningPrimeFactorizer();

        dl.setInputAmount(1);

        DataSetIterator dataSetIterator = new PrimeDataSetIterator(new File(fileName).getAbsolutePath(), batchSize);
        dl.setData(dataSetIterator);

        dl.instantiateNetworkModel();

        dl.trainModel();
    } catch(Exception e)
    {
        throw new RuntimeException(e);
    }

The Graph Configuration:

    ComputationGraphConfiguration configuration = new NeuralNetConfiguration.Builder()
            .seed(seed)
            .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
            .updater(new Adam(0.001))
            .weightInit(WeightInit.XAVIER)
            .graphBuilder()
            .addInputs("input")
            .setInputTypes(InputType.recurrent(inputAmount))
            .addLayer("lstm1", new LSTM.Builder().nOut(128).activation(Activation.TANH).build(), "input")
            .addLayer("batchNorm1", new BatchNormalization.Builder().nOut(128).build(), "lstm1")
            .addLayer("dropout1", new DropoutLayer.Builder(0.5).build(), "batchNorm1")

            .addLayer("lstm2", new LSTM.Builder().nOut(256).activation(Activation.TANH).build(), "dropout1")
            .addLayer("batchNorm2", new BatchNormalization.Builder().nOut(256).build(), "lstm2")
            .addLayer("dropout2", new DropoutLayer.Builder(0.5).build(), "batchNorm2")

            .addLayer("lstm3", new LSTM.Builder().nOut(512).activation(Activation.TANH).build(), "dropout2")
            .addVertex("lastTimeStep", new LastTimeStepVertex("input"), "lstm3")
            .addLayer("batchNorm3", new BatchNormalization.Builder().nOut(512).build(), "lastTimeStep")
            .addLayer("dropout3", new DropoutLayer.Builder(0.5).build(), "batchNorm3")

            .addLayer("dense1", new DenseLayer.Builder().nOut(128).activation(Activation.RELU).build(), "dropout3")
            .addLayer("batchNorm4", new BatchNormalization.Builder().nOut(128).build(), "dense1")
            .addLayer("dropout4", new DropoutLayer.Builder(0.5).build(), "batchNorm4")

            .addLayer("dense2", new DenseLayer.Builder().nOut(100).activation(Activation.RELU).build(), "dropout4")
            .addLayer("batchNorm5", new BatchNormalization.Builder().nOut(100).build(), "dense2")
            .addLayer("dropout5", new DropoutLayer.Builder(0.5).build(), "batchNorm5")

            .addLayer("output", new OutputLayer.Builder().nOut(10).activation(Activation.SOFTMAX).build(), "dropout5")
            .setOutputs("output")
            .build();

    model = new ComputationGraph(configuration);

    model.init();
    model.setListeners(new ScoreIterationListener(iterationsBetweenScores));

The next() method in the Reader that is used in the Iterator:

    System.out.println("Getting elements: " + num);
    INDArray features = Nd4j.create(new int[]{num, 1, 1}, 'f');
    INDArray labels = Nd4j.create(new int[]{num, 1, 1}, 'f');

    for(int i = 0; i < num && iter.hasNext(); i++)
    {
        String featureStr = iter.next();
        currentCursor++;
        featureStr = featureStr.split(";")[0];
        String[] featureAry = featureStr.split(",");
        for(int j = 0; j < featureAry.length - 1; j++)
        {
            int feature = Integer.parseInt(featureAry[j]);
            int label = Integer.parseInt(featureAry[j + 1]);
            features.putScalar(new int[]{i, feature, j}, 1.0);
            labels.putScalar(new int[]{i, label, j}, 1.0);
        }
    }
    return new DataSet(features, labels, null, null);

I also changed my dataset to this format:
32-bits of ‘n’ seperated by ‘,’ then the smaller prime of p,q, separated by ‘;’

1,0,1,0,1,0,0,1,1,0,1,0,1,0,1,0,1,1,0,0,1,0,1,0,0,1,0,1,0,0,0,1;49307

With that setup the error I’m currently getting is:

java.lang.RuntimeException: java.lang.IllegalArgumentException: input.size(1) does not match expected input size of 128 - got input array with shape [1, 2, 128, 1]

Which of the numbers from the shape needs to match the 128? And where could I change the first and the last number in the shape if that would be necessary?

@Fi0x DM me a reproducer I can run and I"ll take a look. Pass me a small dataset you’re using and a complete reproducer.

Your problem is complex enough that I just need to fix it foryou I think.