Hardware reinforcement learning implementation for edge computing applications
Abstract
This study presents an applied research project with exploratory and bibliographic emphasis, aimed at developing the implementation of a Reinforcement Learning algorithm in an edge computing system, beginning in its initial phase with the ESP32 microcontroller. Reinforcement Learning is a branch of Artificial Intelligence that enables autonomous agents to make decisions in dynamic environments through continuous interaction with their surroundings. Although it holds significant potential across various applications, its implementation still largely depends on software platforms and cloud computing, which can lead to connection instabilities, increased latency, and security vulnerabilities. The objective of this study is to analyze alternatives that enable local and integrated execution of the algorithm, offering solutions that combine portability, efficiency, and data security. The selection of the ESP32 as the initial platform is justified by its strong computational performance, compact form factor, extensive connectivity, and low cost, making it suitable for embedded applications. This strategy aims to confirm the feasibility of using simple and inexpensive hardware as an initial stage for future implementation on FPGA, as established in the project.
References
Circuitstate. (2022). DOIT ESP32 DevKit V1 Wi-Fi development board: Pinout diagram & reference. https://www.circuitstate.com/pinouts/doit-esp32-devkit-v1-wifi-development-board-pinout-diagram-and-reference/
Embarcados. (2015). Módulo matriz de LEDs com MAX7219. https://embarcados.com.br/modulo-matriz-de-leds-com-max7219
EMQX. (2024). ESP32 connects to the free public MQTT broker: Publish & subscribe demo with Arduino IDE. https://www.emqx.com/en/blog/esp32-connects-to-the-free-public-mqtt-broker
Espressif Systems. (s.d.). ESP32 overview. https://www.espressif.com/en/products/socs/esp32
Playelek. (s.d.). Pinout DOIT 32 DevKit V1 [Repositório GitHub]. https://github.com/playelek/pinout-doit-32devkitv1
Quincozes, S. E., Tubino, E. R., & Kazienko, J. F. (2019). MQTT protocol: Fundamentals, tools and future directions. IEEE Latin America Transactions, 17(9), 1439–1447. https://doi.org/10.1109/TLA.2019.8991277
Random Nerd Tutorials. (s.d.). ESP32 MQTT: Publish and subscribe with Arduino IDE. https://randomnerdtutorials.com/esp32-mqtt-publish-subscribe-arduino-ide/
Russell, S. J., & Norvig, P. (2010). Artificial intelligence: A modern approach (3rd ed.). Pearson.
Silva, I. N., Spatti, D. H., & Flauzino, R. A. (2010). Redes neurais artificiais para engenharia e ciências aplicadas. Artliber.
Sousa, M. A. A., Pires, R., & Del-Moral-Hernandez, E. (2020). Somprocessor: A high-throughput FPGA-based architecture for implementing self-organizing maps and its application to video processing. Neural Networks, 125, 349–362.
Souza, E. S., & Braga, A. P. (2009). Aprendizado por reforço aplicado ao controle. Revista Controle & Automação, 20(3), 284–295.
Spano, S., Fanni, A., Marras, M., Massidda, L., Pani, D., Raffo, L., & Tuveri, G. (2019). An efficient hardware implementation of reinforcement learning: The Q-learning algorithm. IEEE Access, 7, 186340–186351. https://doi.org/10.1109/ACCESS.2019.2959466
Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction (2nd ed.). MIT Press.
Copyright (c) 2025 Miguel Angelo de Abreu de Sousa, Pedro dos Prazeres Marques, Olívia Furlani Camargo de Souza, Felipe Neves de Sousa Lima, Ricardo Pires

This work is licensed under a Creative Commons Attribution 4.0 International License.
All works published in REGRASP are licensed under Creative Commons Attribution 4.0 International (CC BY 4.0).
This means that:
Anyone can copy, distribute, display, adapt, remix, and even commercially use the content published in the journal;
Provided that due credit is given to the authors and to REGRASP as the original source;
No additional permission is required for reuse, as long as the license terms are respected.
This policy complies with the principles of open access, promoting the broad dissemination of scientific knowledge.



























