Latency and Resource Trade-off Analysis of AXI-DMA and BRAM Integration Approaches on SoC-FPGA

Güner Tatar

doi:10.62520/fujece.1790038

Araştırma Makalesi

Latency and Resource Trade-off Analysis of AXI-DMA and BRAM Integration Approaches on SoC-FPGA

Yıl 2026, Cilt: 5 Sayı: 1 , 316 - 329 , 28.02.2026

Güner Tatar

https://doi.org/10.62520/fujece.1790038

https://izlik.org/JA67SG57PJ

Öz

This paper presents a comparative evaluation of two integration strategies for the Xilinx Zynq-7000 System-on-Chip (SoC): an Advanced eXtensible Interface-Direct Memory Access (AXI-DMA)-based architecture and a Block RAM (BRAM)-based architecture. Both designs employ a custom processing element (PE) for arithmetic operations, yet they differ significantly in data transfer and buffering mechanisms. In the AXI DMA design, communication between the processing system (PS) and programmable logic (PL) is achieved via an AXI4-Stream interface controlled by a DMA engine. In contrast, the BRAM-based design uses dual-port block memories via AXI BRAM controllers, enabling direct operand access. Implementation results indicate that both designs comfortably meet the resource constraints of the XC7Z020 device. However, the AXI DMA-based architecture exhibits higher hardware resource utilization, with average consumption approximately 54% greater than that of the BRAM-based design. Performance analysis reveals a pronounced latency difference: the AXI DMA design required an average of ~1.19 ms per operation. In comparison, the BRAM-based approach achieved a reduction of ~0.10 ms, resulting in a total execution time of 32,487 µs compared to 359,919 µs.
These findings demonstrate a clear trade-off between scalability and latency. While AXI DMA provides flexibility and throughput for stream-oriented applications, BRAM-based integration delivers superior efficiency in small-scale, low-latency scenarios. The study offers practical insights for guiding the design of Field-Programmable Gate Array (FPGA)-based accelerators on heterogeneous computing platforms.

Anahtar Kelimeler

SoC-FPGA integration strategy , AXI-DMA architecture , BRAM-based design , Resource utilization and latency analysis , FPGA-based heterogeneous computing.

Etik Beyan

“There is no need for an ethics committee approval in the prepared article” “There is no conflict of interest with any person/institution in the prepared article”

Kaynakça

A. Rios-Navarro, R. Tapiador-Morales, A. Jimenez-Fernandez, M. Dominguez-Morales, C. Amaya, and A. Linares-Barranco, “Performance evaluation over HW/SW co-design SoC memory transfers for a CNN accelerator,” J. Signal Process. Syst., vol. 91, no. 9, pp. 999–1012, Sep. 2019.
G. Tatar, S. Bayar, and İ. Çiçek, “Real-time multi-learning deep neural network on an MPSoC-FPGA for intelligent vehicles: Harnessing hardware acceleration with pipeline,” IEEE Trans. Intell. Veh., vol. 9, no. 6, pp. 5021–5032, Jun. 2024.
Y. Hao and S. Quigley, “The implementation of a deep recurrent neural network language model on a Xilinx FPGA,” in Appl. Reconfigurable Comput., vol. 10216, pp. 67–78, 2017.
G. Tatar and S. Bayar, “Real-time multi-task ADAS implementation on reconfigurable heterogeneous MPSoC architecture,” IEEE Access, vol. 11, pp. 80741–80760, 2023.
Y. Wang, Z. Li, and H. Liang, “Scatter-gather DMA performance analysis within an SoC FPGA platform,” ACM Trans. Reconfigurable Technol. Syst., vol. 17, no. 2, pp. 1–20, Apr. 2024.
H. Cılasun, “FPGA-accelerated simulation of variable latency memory for hardware/software co-design,” ACM Trans. Des. Autom. Electron. Syst., vol. 28, no. 1, pp. 1–26, Jan. 2023.
G. Tatar et al., “Recent advances in machine learning based advanced driver assistance system applications,” Microprocess. Microsyst., vol. 110, p. 105101, 2024.
J. Johnson, “Using the AXI DMA in Vivado,” FPGA Developer, Aug. 2014.
K. Guerrero-Morejón, A. K. Bhattacharjee, and D. Atienza, “Embedded streaming hardware accelerators interconnect architectures and latency evaluation,” Electron., vol. 14, no. 8, p. 1513, Apr. 2025.
D. A. Jiménez-González, J. M. Arnau, and A. González, “Programmable FPGA-based memory controller: Evaluation of resource utilization and performance trade-offs,” in Proc. Int. Conf. Field-Programmable Logic Appl. (FPL), Sep. 2021.
E. Canto-Navarro, M. López-García, J. Font-Suñer, et al., “AXI hardware accelerator for McEliece on FPGA,” IEEE Trans. Dependable Secure Comput., vol. 22, no. 2, pp. 1–14, Feb. 2025.
J. Boudjadar, “Dynamic FPGA reconfiguration for scalable embedded convolutional neural networks,” Future Gener. Comput. Syst., vol. 157, pp. 45–61, 2025.
B. Kieu-Do-Nguyen, N. T. Binh, C. Pham-Quoc, H. P. Nghi, N.-T. Tran, T.-T. Hoang, and C.-K. Pham, “Compact and low-latency FPGA-based number theoretic transform architecture for CRYSTALS Kyber post-quantum cryptography scheme,” Information, vol. 15, no. 7, Art. no. 400, 2024.
P. Arya and S. H. Mokashi, “FPGA-accelerated RISC-V ISA extensions for efficient neural network inference on edge devices,” arXiv preprint, 2025.
N. Irtija, J. Plusquellic, E. E. Tsiropoulou, J. Goldberg, D. Lobser, and D. Stick, “Design and analysis of digital communication within an SoC-based control system for trapped-ion quantum computing,” IEEE Trans. Quantum Eng., vol. 4, pp. 1–24, Art. no. 5500124, 2023.
C.-T. Axinte, A. Stan, and V.-I. Manta, “Embedded streaming hardware accelerators interconnect architectures and latency evaluation,” Electron., vol. 14, no. 8, p. 1513, 2025.
D. Berrazueta-Mena and B. Navas, “AHA: Design and evaluation of compute-intensive hardware accelerators for AMD-Xilinx Zynq SoCs using HLS IP flow,” Computers, vol. 14, no. 5, p. 189, 2025.

SoC-FPGA Ortamında AXI-DMA ve BRAM Tabanlı Entegrasyon Yöntemlerinde Gecikme ve Kaynak Tüketim Analizi

Yıl 2026, Cilt: 5 Sayı: 1 , 316 - 329 , 28.02.2026

Güner Tatar

https://doi.org/10.62520/fujece.1790038

https://izlik.org/JA67SG57PJ

Öz

Bu makale, Xilinx Zynq-7000 SoC üzerinde tasarlanan iki entegrasyon stratejisinin karşılaştırmalı değerlendirmesini sunmaktadır. Araştırmada, Advanced eXtensible Interface – Direct Memory Access (AXI-DMA) tabanlı mimari ile Block Read Access Memory (BRAM) tabanlı mimari, performans, kaynak kullanımı ve gecikme açısından ele alınmıştır. Her iki tasarım da aritmetik işlemler için özel olarak geliştirilmiş bir işlem elemanı içermektedir; ancak veri aktarımı ve arabellekleme mekanizmaları bakımından önemli farklılıklar göstermektedir. AXI-DMA tabanlı tasarımda, işlemci sistemi ile programlanabilir mantık (PL) arasındaki iletişim, bir DMA motoru tarafından kontrol edilen AXI4-Stream arayüzü üzerinden sağlanmaktadır. BRAM tabanlı mimaride ise AXI BRAM denetleyicileri aracılığıyla erişilen çift portlu blok bellekler kullanılmakta ve doğrudan veri erişimi mümkün olmaktadır.
Uygulama sonuçları, her iki mimarinin de XC7Z020 cihazının kaynak kısıtlarını karşıladığını göstermektedir. Bununla birlikte, AXI-DMA tabanlı tasarımın donanım kaynak tüketiminin, BRAM tabanlı mimariye kıyasla ortalama %54 daha yüksek olduğu belirlenmiştir. Performans analizleri ayrıca gecikme açısından da çarpıcı bir farklılığı ortaya koymaktadır. AXI-DMA mimarisinde işlem başına ortalama 1,19 ms süre gerekmekteyken, BRAM tabanlı yaklaşım yaklaşık 0,10 ms daha düşük gecikme sağlamış ve toplam yürütme süresini 359.919 µs’den 32.487 µs’ye düşürmüştür.
Elde edilen bulgular, ölçeklenebilirlik ile gecikme arasında belirgin bir ödünleşim bulunduğunu ortaya koymaktadır. AXI-DMA mimarisi, akış odaklı uygulamalarda esneklik ve yüksek veri aktarım kapasitesi sunarken, BRAM tabanlı mimari daha küçük ölçekli ve düşük gecikme gerektiren senaryolarda daha yüksek verimlilik sağlamaktadır. Çalışma, heterojen bilgi işlem platformlarında Field Programmable Gate Array (FPGA) tabanlı hızlandırıcıların tasarımı için yol gösterici nitelikte sonuçlar sunmaktadır.

Anahtar Kelimeler

SoC-FPGA uyum stratejisi , AXI-DMA mimarisi , BRAM tabanlı tasarım , Kaynak kullanımı ve gecikme analizi , FPGA tabanlı heterojen hesaplama

Etik Beyan

“Hazırlanan makale için etik kurul onayına gerek yoktur.” “Hazırlanan makalede herhangi bir kişi/kurumla çıkar çatışması bulunmamaktadır.”

Kaynakça

A. Rios-Navarro, R. Tapiador-Morales, A. Jimenez-Fernandez, M. Dominguez-Morales, C. Amaya, and A. Linares-Barranco, “Performance evaluation over HW/SW co-design SoC memory transfers for a CNN accelerator,” J. Signal Process. Syst., vol. 91, no. 9, pp. 999–1012, Sep. 2019.
G. Tatar, S. Bayar, and İ. Çiçek, “Real-time multi-learning deep neural network on an MPSoC-FPGA for intelligent vehicles: Harnessing hardware acceleration with pipeline,” IEEE Trans. Intell. Veh., vol. 9, no. 6, pp. 5021–5032, Jun. 2024.
Y. Hao and S. Quigley, “The implementation of a deep recurrent neural network language model on a Xilinx FPGA,” in Appl. Reconfigurable Comput., vol. 10216, pp. 67–78, 2017.
G. Tatar and S. Bayar, “Real-time multi-task ADAS implementation on reconfigurable heterogeneous MPSoC architecture,” IEEE Access, vol. 11, pp. 80741–80760, 2023.
Y. Wang, Z. Li, and H. Liang, “Scatter-gather DMA performance analysis within an SoC FPGA platform,” ACM Trans. Reconfigurable Technol. Syst., vol. 17, no. 2, pp. 1–20, Apr. 2024.
H. Cılasun, “FPGA-accelerated simulation of variable latency memory for hardware/software co-design,” ACM Trans. Des. Autom. Electron. Syst., vol. 28, no. 1, pp. 1–26, Jan. 2023.
G. Tatar et al., “Recent advances in machine learning based advanced driver assistance system applications,” Microprocess. Microsyst., vol. 110, p. 105101, 2024.
J. Johnson, “Using the AXI DMA in Vivado,” FPGA Developer, Aug. 2014.
K. Guerrero-Morejón, A. K. Bhattacharjee, and D. Atienza, “Embedded streaming hardware accelerators interconnect architectures and latency evaluation,” Electron., vol. 14, no. 8, p. 1513, Apr. 2025.
D. A. Jiménez-González, J. M. Arnau, and A. González, “Programmable FPGA-based memory controller: Evaluation of resource utilization and performance trade-offs,” in Proc. Int. Conf. Field-Programmable Logic Appl. (FPL), Sep. 2021.
E. Canto-Navarro, M. López-García, J. Font-Suñer, et al., “AXI hardware accelerator for McEliece on FPGA,” IEEE Trans. Dependable Secure Comput., vol. 22, no. 2, pp. 1–14, Feb. 2025.
J. Boudjadar, “Dynamic FPGA reconfiguration for scalable embedded convolutional neural networks,” Future Gener. Comput. Syst., vol. 157, pp. 45–61, 2025.
B. Kieu-Do-Nguyen, N. T. Binh, C. Pham-Quoc, H. P. Nghi, N.-T. Tran, T.-T. Hoang, and C.-K. Pham, “Compact and low-latency FPGA-based number theoretic transform architecture for CRYSTALS Kyber post-quantum cryptography scheme,” Information, vol. 15, no. 7, Art. no. 400, 2024.
P. Arya and S. H. Mokashi, “FPGA-accelerated RISC-V ISA extensions for efficient neural network inference on edge devices,” arXiv preprint, 2025.
N. Irtija, J. Plusquellic, E. E. Tsiropoulou, J. Goldberg, D. Lobser, and D. Stick, “Design and analysis of digital communication within an SoC-based control system for trapped-ion quantum computing,” IEEE Trans. Quantum Eng., vol. 4, pp. 1–24, Art. no. 5500124, 2023.
C.-T. Axinte, A. Stan, and V.-I. Manta, “Embedded streaming hardware accelerators interconnect architectures and latency evaluation,” Electron., vol. 14, no. 8, p. 1513, 2025.
D. Berrazueta-Mena and B. Navas, “AHA: Design and evaluation of compute-intensive hardware accelerators for AMD-Xilinx Zynq SoCs using HLS IP flow,” Computers, vol. 14, no. 5, p. 189, 2025.

Toplam 17 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Bilgisayar Yazılımı, Programlama Dilleri
Bölüm	Araştırma Makalesi
Yazarlar	Güner Tatar 0000-0002-3664-1366
Gönderilme Tarihi	23 Eylül 2025
Kabul Tarihi	22 Ocak 2026
Yayımlanma Tarihi	28 Şubat 2026
DOI	https://doi.org/10.62520/fujece.1790038
IZ	https://izlik.org/JA67SG57PJ
Yayımlandığı Sayı	Yıl 2026 Cilt: 5 Sayı: 1

Kaynak Göster

APA	Tatar, G. (2026). Latency and Resource Trade-off Analysis of AXI-DMA and BRAM Integration Approaches on SoC-FPGA. Firat University Journal of Experimental and Computational Engineering, 5(1), 316-329. https://doi.org/10.62520/fujece.1790038
AMA	1.Tatar G. Latency and Resource Trade-off Analysis of AXI-DMA and BRAM Integration Approaches on SoC-FPGA. Firat University Journal of Experimental and Computational Engineering. 2026;5(1):316-329. doi:10.62520/fujece.1790038
Chicago	Tatar, Güner. 2026. “Latency and Resource Trade-off Analysis of AXI-DMA and BRAM Integration Approaches on SoC-FPGA”. Firat University Journal of Experimental and Computational Engineering 5 (1): 316-29. https://doi.org/10.62520/fujece.1790038.
EndNote	Tatar G (01 Şubat 2026) Latency and Resource Trade-off Analysis of AXI-DMA and BRAM Integration Approaches on SoC-FPGA. Firat University Journal of Experimental and Computational Engineering 5 1 316–329.
IEEE	[1]G. Tatar, “Latency and Resource Trade-off Analysis of AXI-DMA and BRAM Integration Approaches on SoC-FPGA”, Firat University Journal of Experimental and Computational Engineering, c. 5, sy 1, ss. 316–329, Şub. 2026, doi: 10.62520/fujece.1790038.
ISNAD	Tatar, Güner. “Latency and Resource Trade-off Analysis of AXI-DMA and BRAM Integration Approaches on SoC-FPGA”. Firat University Journal of Experimental and Computational Engineering 5/1 (01 Şubat 2026): 316-329. https://doi.org/10.62520/fujece.1790038.
JAMA	1.Tatar G. Latency and Resource Trade-off Analysis of AXI-DMA and BRAM Integration Approaches on SoC-FPGA. Firat University Journal of Experimental and Computational Engineering. 2026;5:316–329.
MLA	Tatar, Güner. “Latency and Resource Trade-off Analysis of AXI-DMA and BRAM Integration Approaches on SoC-FPGA”. Firat University Journal of Experimental and Computational Engineering, c. 5, sy 1, Şubat 2026, ss. 316-29, doi:10.62520/fujece.1790038.
Vancouver	1.Güner Tatar. Latency and Resource Trade-off Analysis of AXI-DMA and BRAM Integration Approaches on SoC-FPGA. Firat University Journal of Experimental and Computational Engineering. 01 Şubat 2026;5(1):316-29. doi:10.62520/fujece.1790038

Makale Dosyaları

Tam Metin

Bu eser Creative Commons Atıf-GayriTicari 4.0 Uluslararası Lisansı (CC BY NC) ile lisanslanmıştır.