The cybersecurity breaches render surveillance systems vulnerable to video
forgery attacks, under which authentic live video streams are tampered to
conceal illegal human activities under surveillance cameras. Traditional video
forensics approaches can detect and localize forgery traces in each video frame
using computationally-expensive spatial-temporal analysis, while falling short
in real-time verification of live video feeds. The recent work correlates
time-series camera and wireless signals to recognize replayed surveillance
videos using event-level timing information but it cannot realize fine-grained
forgery detection and localization on each frame. To fill this gap, this paper
proposes Secure-Pose, a novel cross-modal forgery detection and localization
system for live surveillance videos using WiFi signals near the camera spot. We
observe that coexisting camera and WiFi signals convey common human semantic
information and the presence of forgery attacks on video frames will decouple
such information correspondence. Secure-Pose extracts effective human pose
features from synchronized multi-modal signals and detects and localizes
forgery traces under both inter-frame and intra-frame attacks in each frame. We
implement Secure-Pose using a commercial camera and two Intel 5300 NICs and
evaluate it in real-world environments. Secure-Pose achieves a high detection
accuracy of 95.1% and can effectively localize tampered objects under different
forgery attacks.

