Text Classification, need advice

Hi all,

I am trying to classify the text into 4 categories, for the sake of illustration, the categories (labels) are as follows:
-booking
-wheater
-timer
-bestSelling

Now, for each category, I have 120 documents for model training. For example, for the “booking”:

        "I want to secure a flight for my journey.",
        "Can you help me reserve a table for dinner this evening?",
        "Please arrange a hotel room for the weekend.",
        "I need to purchase a ticket for the upcoming festival.",

…and so on

-Those training documents generate a vocabulary of 850 + different words in total.
-Each category generates an equal number of different words.
-Here is configuration I am used:

      MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
               .seed(123)
               .updater(new Adam(0.01))
               .list()
               .layer(new DenseLayer.Builder()
                        .nIn(vocabulary.size()) 
                        .nOut(20) 
                        .activation(Activation.RELU)
                        .build())
               .layer(new DenseLayer.Builder() 
                        .nIn(20)
                        .nOut(15) 
                        .activation(Activation.RELU)
                        .build())

               .layer(new DenseLayer.Builder() 
                        .nIn(15) 
                        .nOut(12) 
                        .activation(Activation.RELU)
                        .build())
               .layer(new DenseLayer.Builder() 
                        .nIn(12) 
                        .nOut(10) 
                        .activation(Activation.RELU)
                        .build())
               .layer(new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
                        .activation(Activation.SOFTMAX)
                        .nIn(10) 
                        .nOut(4)
                        .build())
               .build();


The problem: :
Regardless of the configuration used (number of Dense layres, no of nodes etc…), the performance of the problem is very poor, and the probability of text belonging to categories is almost always the same, for example:

Predicted class probabilities: [0.2515209913253784, 0.2594231963157654, 0.24817872047424316, 0.24087709188461304]

The questions:
-Are 120 documents per category enough for model training?
-What rules should I follow when choosing a configuration?
-Do you see any problem in all of this that I have stated, which is persistently eluding me?

@Java_Developer what’s the class distribution of each label? You might need to either apply regularization or apply some sort of resampling if you aren’t able to generalize your model well.

I take it you’re using bag of words for the classification?