Transfer Learning with LSTM

Hi folks!
I am trying to use transfer learning(TL)with text data. I have LSTM model looking to perform TL on that. Does dl4j support TL for LSTM models ?
If yes can someone share the part of the code.

The example for transferlearning (see https://github.com/eclipse/deeplearning4j-examples/tree/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/transferlearning/vgg16) applies pretty much exactly the same to LSTM’s as well.

Feel free to also take a look at the documentation about it: https://deeplearning4j.konduit.ai/tuning-and-training/transfer-learning

If you still have issues with coming up with a solution yourself, share what you’ve tried so far, and we can see where it has gone wrong.

Thanks Treo for the update.
I have tried transfer leaning with MLP and CNN it appears to be working.
For some reason LSTM does not produce the expected results, hence wanted to see if you have specific example that shows transfer learning with LSTM.

What exactly did you try, and how were the results unexpected?

There are no examples showing it explicitly, as it should be working exactly as for the other cases.

  • So, at high level accuracy measure from lstm after transfer learning does not match what was expected, even though most part of the incremental training data was already shown to the model.

can you share how you did it? Maybe it is a bug, so I’d like to try and reproduce it.

  • These are steps that i followed :

Following steps were performed:

  1. Train and export the learned model :Trained the model with original data and export it as *.zip file on your local disk.
  2. Data prep : Created a delta dataset that is part of original data.
  3. Featurization : Now that we have a pretrained model. defined the frozen parameters with a fine tuning config and then save those network config data into local disk.

I meant the actual code. If I get something that I can actually run, I can spend the time on debugging instead of trying to get it to behave as it does in your case.

Due to proprietary nature of data and models. I can’t share that part of code with you.
What I can point to you is “UCISequenceClassificationExample.java” from dl4j examples would be the closest example.
Appreciate your response and help.

Hi @data-llectual. Can you expand a little on what you meant by accuracy? Do you mean that before training your fine tuned model you were seeing a difference in accuracy from what you were expecting? Also can you expand on what fine tuning you did? Even a snippet of the code showing the fine tune configuration would be helpful. Thanks.

		MultiLayerNetwork oldModel = MultiLayerNetwork.load(savedLocation, saveUpdater);
		//System.out.println(restored.getLayerWiseConfigurations().toJson());

		FineTuneConfiguration fineTuneConf = new FineTuneConfiguration.Builder()
				.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
				.updater(new Nesterovs(5e-5))
				.biasInit(0.001)
				.gradientNormalization(GradientNormalization.RenormalizeL2PerLayer)
				.l2(0.0001)
				//.weightInit(WeightInit.DISTRIBUTION)
				.seed(seed)
				.build();

		MultiLayerNetwork newModel = new TransferLearning.Builder(oldModel)
				.fineTuneConfiguration(fineTuneConf)
				.setFeatureExtractor(1)
				.build();

		TransferLearningHelper transferLearningHelper = new TransferLearningHelper(newModel);

		//Data set iterator part

My incremental dataset is part of original data that I am trying to use transfer learning on, hence I should get a comparable measure of accuracy- before and after transfer learning.

Can you compare parameters with the old model and the newModel before fine tuning/training? They should be the same since you are not modifying the architecture in any way.