TinyStack is a novel way for deploying GPU-accelerated computation on mobile
and embedded devices. It addresses the high complexity of a modern GPU stack.
Without an overhaul of the stack, TinyStack provides a static, fast path for an
app to push its computation to GPU. It records GPU executions on the full GPU
stack ahead of time and replays the executions with only a small replayer on
new input at run time. TinyStack addresses challenges in capturing key CPU/GPU
interactions and GPU states, working around proprietary GPU internals, and
preventing replay divergence. The resultant replayer is a drop-in replacement
of the original GPU stack. It is tiny (as few as 50 KB executable), robust
(replaying long executions without divergence), portable (running in a POSIX
OS, in TEE, or on baremetal), and quick to launch (speeding up startup by up to
two orders of magnitude). We have implemented TinyStack and tested it with a
variety of ML frameworks, GPU programming APIs, and integrated GPUs.

By admin