BertInferenceExample, fine tune question

In this example, i wanna fine tune my own bert model, by follow this instruction:

Blockquote
https://github.com/KonduitAI/dl4j-dev-tools/tree/master/import-tests/model_zoo/bert

as google says, the eval_accuracy should be expected between 84%-88% like this:
image
but after my fine tune, my result is:
image

my accuracy have only 68%, but i complish fine tune strictly followed by the instruction, i wonder where the problem is.
appreciate for your help.

I can get ‘eval_accuracy = 0.8627451’ on linux using cpu to train.

The log:

/model.ckpt-2751Saving ‘checkpoint_path’ summary for global step 2751: /TF_Graphs/mrpc_output/
/model.ckpt-275124854 139784295372608 estimator.py:2109] Saving ‘checkpoint_path’ summary for global step 2751: /TF_Graphs/mrpc_output/
INFO:tensorflow:evaluation_loop marked as finished
I0720 10:59:38.225406 139784295372608 error_handling.py:101] evaluation_loop marked as finished
INFO:tensorflow:***** Eval results *****
I0720 10:59:38.225584 139784295372608 run_classifier.py:923] ***** Eval results *****
INFO:tensorflow: eval_accuracy = 0.8627451
I0720 10:59:38.225656 139784295372608 run_classifier.py:925] eval_accuracy = 0.8627451
INFO:tensorflow: eval_loss = 0.7270211
I0720 10:59:38.225800 139784295372608 run_classifier.py:925] eval_loss = 0.7270211
INFO:tensorflow: global_step = 2751
I0720 10:59:38.225892 139784295372608 run_classifier.py:925] global_step = 2751
INFO:tensorflow: loss = 0.7270211
I0720 10:59:38.225950 139784295372608 run_classifier.py:925] loss = 0.7270211

I’ll try linux.maybe is caused by system. because i fine tuned bert on windows, with tensorflow-gpu.

I tried to freeze chinese_L-12_H-768_A-12.zip using command like ‘python3 freezeTrainedBert.py --input_dir=/chinese_L-12_H-768_A-12 --ckpt=bert_model.ckpt’ and got error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input node loss/Softmax not found in graph
Should i audit freezeTrainedBert.py?

我还没有做到冻结图这一步,但根据这个网页教程

的步骤来,应该是没有问题的。
我在Linux下重新fine tune了一下,发现用CPU进行fine tune是没有问题的,得到了 eval_accuracy = 0.8480392。

So it’s fine on cpu.
I care how to load chinese_L-12_H-768_A-12 in order to transfer-train using custom data.

About custom data, maybe you need have your own data processor.

Could you compare results on cpu and gpu? Are you saying the numbers are different? Make sure to make it reproducible (setting a seed, same parameters,…) to see if we have a reproducible issue here.

you means my fine tune progress? i had compared results both on CPU & GPU and Windows & Ubuntu, i’d like to share my experiment

I fine tuned bert with MRPC task.experiment is present below.

Environment
python3.6
tensorflow 1.11.0 # use this for training on CPU
tensorflow-gpu 1.11.0 # use this for traning on GPU

Parameter:
max_seq_length=128
train_batch_size=4
learning_rate=2e-5
num_train_epochs=3.0
all these parameter is presented at
https://github.com/KonduitAI/dl4j-dev-tools/tree/master/import-tests/model_zoo/bert

Experiment
win10 + tensorflow 1.11.0 + CPU. eval_accuracy = 0.68
win10 + tensorflow-gpu 1.11.0 + GPU. eval_accuracy = 0.68
ubuntu + tensorflow 1.11.0 + CPU. eval_accuracy =0.84
ubuntu + tensorflow-gpu 1.11.0 + GPU. eval_accuracy = 0.68

obviously, only on ubuntu with CPU, the result is correct. so, we could only fine tuned bert on CPU in ubuntu, but it cost 2-3 hours per training. it only cost a quarter per training on GPU, but the result is wrong.
that’s is all my experiment.

oh it’s ridiculous, on my virtual machine with ubuntu using cpu, eval_accuracy = 0.84, then i install ubuntu to my computer(dual system), using cpu, eval_accuracy = 0.68, I’m confuse :joy:

solved. I trans the parameter in a wrong way, i see this method at a blog. with google instruction, trans the parameter at command line, all is ok.