We give an $(varepsilon,delta)$-differentially private algorithm for the
multi-armed bandit (MAB) problem in the shuffle model with a
distribution-dependent regret of $Oleft(left(sum_{ain
T}{varepsilon}right)$, and a distribution-independent regret of
$Oleft(sqrt{kTlog T}+frac{ksqrt{logfrac{1}{delta}}log
T}{varepsilon}right)$, where $T$ is the number of rounds, $Delta_a$ is the
suboptimality gap of the arm $a$, and $k$ is the total number of arms. Our
upper bound almost matches the regret of the best known algorithms for the
centralized model, and significantly outperforms the best known algorithm in
the local model.

By admin