Deep neural networks for video classification, just like image classification
networks, may be subjected to adversarial manipulation. The main difference
between image classifiers and video classifiers is that the latter usually use
temporal information contained within the video. In this work we present a
manipulation scheme for fooling video classifiers by introducing a flickering
temporal perturbation that in some cases may be unnoticeable by human observers
and is implementable in the real world. After demonstrating the manipulation of
action classification of single videos, we generalize the procedure to make
universal adversarial perturbation, achieving high fooling ratio. In addition,
we generalize the universal perturbation and produce a temporal-invariant
perturbation, which can be applied to the video without synchronizing the
perturbation to the input. The attack was implemented on several target models
and the transferability of the attack was demonstrated. These properties allow
us to bridge the gap between simulated environment and real-world application,
as will be demonstrated in this paper for the first time for an over-the-air
flickering attack.

