Transfer Learning with LSTM

data-llectual · April 7, 2020, 8:12pm

Hi folks!
I am trying to use transfer learning(TL)with text data. I have LSTM model looking to perform TL on that. Does dl4j support TL for LSTM models ?
If yes can someone share the part of the code.

treo · April 8, 2020, 7:18am

The example for transferlearning (see https://github.com/eclipse/deeplearning4j-examples/tree/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/transferlearning/vgg16) applies pretty much exactly the same to LSTM’s as well.

Feel free to also take a look at the documentation about it: https://deeplearning4j.konduit.ai/tuning-and-training/transfer-learning

If you still have issues with coming up with a solution yourself, share what you’ve tried so far, and we can see where it has gone wrong.

data-llectual · April 8, 2020, 3:19pm

Thanks Treo for the update.
I have tried transfer leaning with MLP and CNN it appears to be working.
For some reason LSTM does not produce the expected results, hence wanted to see if you have specific example that shows transfer learning with LSTM.

treo · April 8, 2020, 3:35pm

What exactly did you try, and how were the results unexpected?

There are no examples showing it explicitly, as it should be working exactly as for the other cases.

data-llectual · April 8, 2020, 3:42pm

So, at high level accuracy measure from lstm after transfer learning does not match what was expected, even though most part of the incremental training data was already shown to the model.

treo · April 8, 2020, 3:44pm

can you share how you did it? Maybe it is a bug, so I’d like to try and reproduce it.

data-llectual · April 8, 2020, 10:28pm

These are steps that i followed :

Following steps were performed:

Train and export the learned model :Trained the model with original data and export it as *.zip file on your local disk.
Data prep : Created a delta dataset that is part of original data.
Featurization : Now that we have a pretrained model. defined the frozen parameters with a fine tuning config and then save those network config data into local disk.

treo · April 9, 2020, 6:14am

I meant the actual code. If I get something that I can actually run, I can spend the time on debugging instead of trying to get it to behave as it does in your case.

data-llectual · April 10, 2020, 4:03pm

Due to proprietary nature of data and models. I can’t share that part of code with you.
What I can point to you is “UCISequenceClassificationExample.java” from dl4j examples would be the closest example.
Appreciate your response and help.

eraly · April 10, 2020, 5:59pm

Hi @data-llectual. Can you expand a little on what you meant by accuracy? Do you mean that before training your fine tuned model you were seeing a difference in accuracy from what you were expecting? Also can you expand on what fine tuning you did? Even a snippet of the code showing the fine tune configuration would be helpful. Thanks.

data-llectual · April 10, 2020, 6:26pm

		MultiLayerNetwork oldModel = MultiLayerNetwork.load(savedLocation, saveUpdater);
		//System.out.println(restored.getLayerWiseConfigurations().toJson());

		FineTuneConfiguration fineTuneConf = new FineTuneConfiguration.Builder()
				.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
				.updater(new Nesterovs(5e-5))
				.biasInit(0.001)
				.gradientNormalization(GradientNormalization.RenormalizeL2PerLayer)
				.l2(0.0001)
				//.weightInit(WeightInit.DISTRIBUTION)
				.seed(seed)
				.build();

		MultiLayerNetwork newModel = new TransferLearning.Builder(oldModel)
				.fineTuneConfiguration(fineTuneConf)
				.setFeatureExtractor(1)
				.build();

		TransferLearningHelper transferLearningHelper = new TransferLearningHelper(newModel);

		//Data set iterator part

data-llectual · April 10, 2020, 6:30pm

My incremental dataset is part of original data that I am trying to use transfer learning on, hence I should get a comparable measure of accuracy- before and after transfer learning.

eraly · April 10, 2020, 6:34pm

Can you compare parameters with the old model and the newModel before fine tuning/training? They should be the same since you are not modifying the architecture in any way.

Topic		Replies	Views
BertInferenceExample, fine tune question DL4J	11	646	July 26, 2020
A post in "Weird results from my LSTM prediction" requires staff attention DL4J	2	406	January 11, 2021
Imported Keras LSTM layer mismatch DL4J	18	1481	February 14, 2020
Does anyone know how to load and classify a single image with TransferLearningHelper? DL4J	2	362	August 16, 2021
Low accuracy compared to model trained with Keras DL4J	8	772	August 21, 2020

Transfer Learning with LSTM

Related topics