Regret Bounds for Satisficing in Multi-Armed Bandit Problems

Thomas Michel; Hossein Hajiabolhassan; Ronald Ortner

Regret Bounds for Satisficing in Multi-Armed Bandit Problems

Research output: Contribution to journal › Article › Research › peer-review

Standard

Regret Bounds for Satisficing in Multi-Armed Bandit Problems. / Michel, Thomas; Hajiabolhassan, Hossein ; Ortner, Ronald.
In: Transactions on machine learning research, Vol. 2023, No. August, 07.06.2023.

Research output: Contribution to journal › Article › Research › peer-review

Bibtex - Download

@article{745c321f5100434fa2a990c69cddc08e,

title = "Regret Bounds for Satisficing in Multi-Armed Bandit Problems",

abstract = "This paper considers the objective of \textit{satisficing} in multi-armed bandit problems. Instead of aiming to find an optimal arm, the learner is content with an arm whose reward is above a given satisfaction level. We provide algorithms and analysis for the realizable case when such a satisficing arm exists as well as for the general case when this may not be the case. Introducing the notion of \textit{satisficing regret}, our main result shows that in the general case it is possible to obtain constant satisficing regret when there is a satisficing arm (thereby correcting a contrary claim in the literature), while standard logarithmic regret bounds can be re-established otherwise. Experiments illustrate that our algorithm is not only superior to standard algorithms in the satisficing setting, but also works well in the classic bandit setting.",

author = "Thomas Michel and Hossein Hajiabolhassan and Ronald Ortner",

year = "2023",

month = jun,

day = "7",

language = "English",

volume = "2023",

journal = "Transactions on machine learning research",

issn = "2835-8856",

number = "August",

}

RIS (suitable for import to EndNote) - Download

TY - JOUR

T1 - Regret Bounds for Satisficing in Multi-Armed Bandit Problems

AU - Michel, Thomas

AU - Hajiabolhassan, Hossein

AU - Ortner, Ronald

PY - 2023/6/7

Y1 - 2023/6/7

N2 - This paper considers the objective of \textit{satisficing} in multi-armed bandit problems. Instead of aiming to find an optimal arm, the learner is content with an arm whose reward is above a given satisfaction level. We provide algorithms and analysis for the realizable case when such a satisficing arm exists as well as for the general case when this may not be the case. Introducing the notion of \textit{satisficing regret}, our main result shows that in the general case it is possible to obtain constant satisficing regret when there is a satisficing arm (thereby correcting a contrary claim in the literature), while standard logarithmic regret bounds can be re-established otherwise. Experiments illustrate that our algorithm is not only superior to standard algorithms in the satisficing setting, but also works well in the classic bandit setting.

AB - This paper considers the objective of \textit{satisficing} in multi-armed bandit problems. Instead of aiming to find an optimal arm, the learner is content with an arm whose reward is above a given satisfaction level. We provide algorithms and analysis for the realizable case when such a satisficing arm exists as well as for the general case when this may not be the case. Introducing the notion of \textit{satisficing regret}, our main result shows that in the general case it is possible to obtain constant satisficing regret when there is a satisficing arm (thereby correcting a contrary claim in the literature), while standard logarithmic regret bounds can be re-established otherwise. Experiments illustrate that our algorithm is not only superior to standard algorithms in the satisficing setting, but also works well in the classic bandit setting.

M3 - Article

VL - 2023

JO - Transactions on machine learning research

JF - Transactions on machine learning research

SN - 2835-8856

IS - August

ER -

Research Portal

Regret Bounds for Satisficing in Multi-Armed Bandit Problems

Standard

Harvard

APA

Vancouver

Author

Bibtex - Download

RIS (suitable for import to EndNote) - Download

Links

SFX

Research Portal

Regret Bounds for Satisficing in Multi-Armed Bandit Problems

Standard

Harvard

APA

Vancouver

Author

Bibtex - Download

RIS (suitable for import to EndNote) - Download

Links

SFX

By the same authors

Online Regret Bounds for Satisficing in MDPs

Online Regret Bounds for Satisficing in MDPs

Regret Bounds for Satisficing in Multi-Armed Bandit Problems

Adaptive Algorithms for Meta-Induction

FunQG: Molecular Representation Learning Via Quotient Graphs