Hardware reinforcement learning implementation for edge computing applications

Keywords: artificial intelligence, reinforcement learning, edge computing, microcontrollers

Abstract

This study presents an applied research project with exploratory and bibliographic emphasis, aimed at developing the implementation of a Reinforcement Learning algorithm in an edge computing system, beginning in its initial phase with the ESP32 microcontroller. Reinforcement Learning is a branch of Artificial Intelligence that enables autonomous agents to make decisions in dynamic environments through continuous interaction with their surroundings. Although it holds significant potential across various applications, its implementation still largely depends on software platforms and cloud computing, which can lead to connection instabilities, increased latency, and security vulnerabilities. The objective of this study is to analyze alternatives that enable local and integrated execution of the algorithm, offering solutions that combine portability, efficiency, and data security. The selection of the ESP32 as the initial platform is justified by its strong computational performance, compact form factor, extensive connectivity, and low cost, making it suitable for embedded applications. This strategy aims to confirm the feasibility of using simple and inexpensive hardware as an initial stage for future implementation on FPGA, as established in the project.

Author Biographies

Pedro dos Prazeres Marques, IFSP - Campus São Paulo

7th semester student of Control and Automation Engineering, recipient of a PIBIFISP 2025 scientific initiation scholarship.

Olívia Furlani Camargo de Souza, IFSP - Campus São Paulo

7th semester student of Control and Automation Engineering, participant in the Artificial Intelligence study group in 2025.

Felipe Neves de Sousa Lima, IFSP - Campus São Paulo

7th semester student of Control and Automation Engineering, participant in the Artificial Intelligence study group in 2025.

Ricardo Pires, IFSP - Campus São Paulo

He holds a degree in Electrical Engineering from the Polytechnic School of the University of São Paulo (1991), a master's degree in Electrical Engineering from the Polytechnic School of the University of São Paulo (1994), and a doctorate in Automatic and Microelectronic Systems from the Université de Montpellier II (Sciences et Techniques du Languedoc), France (1998). He is currently a professor at the Federal Institute of Education, Science and Technology of São Paulo.

Miguel Angelo de Abreu de Sousa, IFSP - Campus São Paulo

He holds a PhD and Master's degree in Electrical Engineering from the Polytechnic School of the University of São Paulo (POLI-USP). He has a degree in Electrical Engineering from the São Paulo Engineering Faculty and a degree in Electronics Technology from Mackenzie Presbyterian University. He is currently a professor in the Electrical Engineering Department at the Federal Institute of Education, Science and Technology of São Paulo (IFSP) and a member of the AI ​​- Advanced Institute for Artificial Intelligence. His interests include the study of Intelligent Systems, electrical circuit architectures for the implementation of neural computing models, and – more recently – ethics in Artificial Intelligence.

References

Banzi, M., & Shiloh, M. (2014). Arduino: An open-source electronics prototyping platform (2ª ed.). Maker Media.
Circuitstate. (2022). DOIT ESP32 DevKit V1 Wi-Fi development board: Pinout diagram & reference. https://www.circuitstate.com/pinouts/doit-esp32-devkit-v1-wifi-development-board-pinout-diagram-and-reference/
Embarcados. (2015). Módulo matriz de LEDs com MAX7219. https://embarcados.com.br/modulo-matriz-de-leds-com-max7219
EMQX. (2024). ESP32 connects to the free public MQTT broker: Publish & subscribe demo with Arduino IDE. https://www.emqx.com/en/blog/esp32-connects-to-the-free-public-mqtt-broker
Espressif Systems. (s.d.). ESP32 overview. https://www.espressif.com/en/products/socs/esp32
Playelek. (s.d.). Pinout DOIT 32 DevKit V1 [Repositório GitHub]. https://github.com/playelek/pinout-doit-32devkitv1
Quincozes, S. E., Tubino, E. R., & Kazienko, J. F. (2019). MQTT protocol: Fundamentals, tools and future directions. IEEE Latin America Transactions, 17(9), 1439–1447. https://doi.org/10.1109/TLA.2019.8991277
Random Nerd Tutorials. (s.d.). ESP32 MQTT: Publish and subscribe with Arduino IDE. https://randomnerdtutorials.com/esp32-mqtt-publish-subscribe-arduino-ide/
Russell, S. J., & Norvig, P. (2010). Artificial intelligence: A modern approach (3rd ed.). Pearson.
Silva, I. N., Spatti, D. H., & Flauzino, R. A. (2010). Redes neurais artificiais para engenharia e ciências aplicadas. Artliber.
Sousa, M. A. A., Pires, R., & Del-Moral-Hernandez, E. (2020). Somprocessor: A high-throughput FPGA-based architecture for implementing self-organizing maps and its application to video processing. Neural Networks, 125, 349–362.
Souza, E. S., & Braga, A. P. (2009). Aprendizado por reforço aplicado ao controle. Revista Controle & Automação, 20(3), 284–295.
Spano, S., Fanni, A., Marras, M., Massidda, L., Pani, D., Raffo, L., & Tuveri, G. (2019). An efficient hardware implementation of reinforcement learning: The Q-learning algorithm. IEEE Access, 7, 186340–186351. https://doi.org/10.1109/ACCESS.2019.2959466
Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction (2nd ed.). MIT Press.
Published
2025-12-04
How to Cite
Marques, P., Souza, O., Lima, F., Pires, R., & Sousa, M. (2025). Hardware reinforcement learning implementation for edge computing applications. Revista Para Graduandos/Instituto Federal De Educação, Ciência E Tecnologia De São Paulo - Campus São Paulo - REGRASP, 10(4), 40-46. https://doi.org/10.47734/regrasp.v10.04.p40-46