Environment:
- Python version 3.7
- Spark version 2.4
- TensorFlow version 2.5
- TensorFlowOnSpark version 2.2.3
- Cluster version hadoop
Describe the bug:
I found the evaluator node won't work any more after sometime while training nodes work fine and the whole cluster doesn't crash. The total training step is 80000 and the evaluator only evaluates for 10000+ step. After that no more logs are output.

