Neural Network Struggling to Reproduce Tails in Bi-Modal Distribution

Hello, I wrote a DL4J program in Kotlin which uses a single neural network for generating data mimicking the training data’s distribution. In this particular case, I am trying to generate data following a bi-modal normal distribution. The neural network is a 2-hidden-layer model with Sigmoid activation functions.

I select a batch of variables from my training data set and sort them in increasing order. Then, I generate the same number of random values between 0 and 1. They are sorted in increasing order as well before being fed into the model. The network takes as input one value at a time, and outputs one. The loss function is the sum of the absolute differences between generated and training values. In this way, I am essentially computing and minimizing the distance between the two distributions.

The model is trained on 20 batches of 5 000 values, selected from a set of 100 000 training data points. This is repeated for a certain number of runs. I have experimented with different learning rate values, network depths and sizes, batch sizes, and activation functions. The problem I am running into each time is that the model is not able to reproduce the tails, no matter how long I let it run. I had the same results using RELU activation. Below is my generated data at the 67th run, and the training data batch at the same iteration.
I would really appreciate any possible solutions.

@genlarose could you post your configuration?