Tf1.4(pb) embedding operator is slow

Hi,

If I use tf1.4 model with pb format (SameDiff), it takes almost 200ms for single data in DeepFM (70 features, 3 dim each embedding lookup vector). After debug, we catch the problems in embedding operator