In what kind of real-life situations can we use a multi-arm bandit algorithm?

Multi-arm bandits work well in situation where you have choices and you are not sure which one will maximize your well being. You can use the algorithm for some real life situations. As an example, learning can be a good field:

If a kid is learning carpentry and he is bad at it, the algorithm will tell him/her that he/she probably should need to move on. If he/she is good at it, the algorithm will tell him/her to continue to learn that field.

Dating is a also a good field:

You’re a man on your putting a lot of ‘effort’ in pursuing a lady. However, your efforts are definitely unwelcomed. The algorithm should “slightly” (or strongly) nudge you to move on.

What others real-life situation can we use the multi-arm bandit algorithm for?

PS: If the question is too broad, please leave a comment. If there is a consensus, I’ll remove my question.


When you play the original Pokemon games (Red or Blue and Yellow) and you get to Celadon city, the Team rocket slot machines have different odds. Multi-Arm Bandit right there if you want to optimize getting that Porygon really fast.

In all seriousness, people talk about the problem with choosing tuning variables in machine learning. Especially if you have a lot of of variables, exploration vs exploitation gets talked about. See like Spearmint or even the new paper in this topic that uses a super simple algorithm to choose tuning parameters (and way outperforms other tuning variable techniques)

Source : Link , Question Author : Community , Answer Author :

Leave a Comment