Quantum thermal machines are micro-scale devices that convert between heat and work exploiting quantum effects. Optimally controlling such systems as to maximize their performance is an extremely challenging task. Here we develop a mathematical
framework, based on Reinforcement Learning, to optimally control Quantum thermal machines exploiting quantum measurements and feedback. The method finds Pareto-optimal tradeoffs between high power, high efficiency and low power fluctuations.
Applications to real-world quantum devices are foreseen.