Price optimization is a complex task, especially for a two-sided marketplace where maintaining the balance between demand and supply is key for the health of the business. In the case of Cabify, deviations from such equilibrium can be harmful in several ways. Increasing prices too much will dissuade users from requesting rides, but by lowering them we may over-burden our fleet or in the worst case even lose the drivers’ interest in favor of more profitable competitors. The challenge of finding the perfect balance point is made even more arduous by its non-stationarity. Dramatic shocks like the ongoing global pandemic can shift the market to a different state and we must be able to adapt promptly. At Cabify, we have teams of experts that analyze the market and make calls on what are the best prices for each city based on their intuition and knowledge of the local particularities. Besides, we have developed tools that enable our experts to quantitatively evaluate their decisions in controlled experiments. In these experiments, new pricing schemes are tested on a random subset of users to understand how they react to them. This approach, while it has proven successful until now, is showing its limitations in terms of scalability and readiness for a business that is rapidly growing in size and complexity. We can not rely so heavily on human experts for every step of the process any more: we need to step up our game and build an automatic pricing system. Our challenge consists of how to make automatic decisions under high uncertainty. For this reason, we started exploring the use of Reinforcement Learning techniques for pricing optimization. The problem at hand consists of choosing the best price for each type of ride, with the handicap of not knowing how users will respond to these prices in advance. Our problem is similar to the one of the Multi-Armed Bandit problem, which consists of allocating a fixed set of resources among competing choices from which we don’t know which one is the best. We believe that this framework, widely used in online marketing to serve ads, can be adapted to the pricing task. Different algorithms were tested, and in the end, a custom solution was build that leverages the natural structure of the problem: the cheaper a price is, the more likely is to be accepted. Once built the automatic pricing system, we tested it in a controlled experiment before fully enabling it. We satisfactorily found that it worked as expected: by increasing or decreasing prices this system was capable of finding a better price that improved our driver’s earnings significantly. While making this step ahead, however, we could already see in which direction the next one should be. The data from our experiments revealed that not all types of metrics improved. We were able to spot that in some cases the mid-term user retention was being negatively affected by the new prices, even if in the short term it was bringing better revenues. A high price does not only influence users’ instant decision on a journey, but also their overall experience and the chance of using Cabify in the future. By developing a myopic solution we were ignoring the impact on long-term retention, thereby risking losing users in the long run. We are currently improving this solution to include these long-term factors in the pricing decision. While still a Work-In-Progress, this solution is a good example of how Reinforcement Learning is not only good to create invincible gaming AIs, but also to have a positive impact in real-world business.