A vehicle and a system and method of operating the vehicle based on a gesture made by a traffic director. The system includes a camera and at least one neural network. The camera obtains an image of a flag operator. The at least one neural network is to generates an encoded hand vector based on a configuration of a hand of the traffic director from the image, combines a skeleton of the traffic director generated from the image and the encoded hand vector to generate a representation vector, and predicts a gesture of the traffic director from the representation vector.

