Regret Bounds for Satisficing in Multi-Armed Bandit Problems
Research output: Contribution to journal › Article › Research › peer-review
Standard
In: Transactions on machine learning research, Vol. 2023, No. August, 07.06.2023.
Research output: Contribution to journal › Article › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex - Download
}
RIS (suitable for import to EndNote) - Download
TY - JOUR
T1 - Regret Bounds for Satisficing in Multi-Armed Bandit Problems
AU - Michel, Thomas
AU - Hajiabolhassan, Hossein
AU - Ortner, Ronald
PY - 2023/6/7
Y1 - 2023/6/7
N2 - This paper considers the objective of \textit{satisficing} in multi-armed bandit problems. Instead of aiming to find an optimal arm, the learner is content with an arm whose reward is above a given satisfaction level. We provide algorithms and analysis for the realizable case when such a satisficing arm exists as well as for the general case when this may not be the case. Introducing the notion of \textit{satisficing regret}, our main result shows that in the general case it is possible to obtain constant satisficing regret when there is a satisficing arm (thereby correcting a contrary claim in the literature), while standard logarithmic regret bounds can be re-established otherwise. Experiments illustrate that our algorithm is not only superior to standard algorithms in the satisficing setting, but also works well in the classic bandit setting.
AB - This paper considers the objective of \textit{satisficing} in multi-armed bandit problems. Instead of aiming to find an optimal arm, the learner is content with an arm whose reward is above a given satisfaction level. We provide algorithms and analysis for the realizable case when such a satisficing arm exists as well as for the general case when this may not be the case. Introducing the notion of \textit{satisficing regret}, our main result shows that in the general case it is possible to obtain constant satisficing regret when there is a satisficing arm (thereby correcting a contrary claim in the literature), while standard logarithmic regret bounds can be re-established otherwise. Experiments illustrate that our algorithm is not only superior to standard algorithms in the satisficing setting, but also works well in the classic bandit setting.
M3 - Article
VL - 2023
JO - Transactions on machine learning research
JF - Transactions on machine learning research
SN - 2835-8856
IS - August
ER -