Research Article
BibTex RIS Cite

Duff Aygıtı Tabanlı Seyrek Matris-Vektör Çarpımı

Year 2020, , 315 - 324, 15.05.2020
https://doi.org/10.21205/deufmd.2020226501

Abstract

Seyrek matris-vektör çarpımı (SpMV) pek çok mühendislik probleminde ve bilimsel hesaplamada sıklıkla kullanılan bir işlemdir. SpMV’nin hızlandırılması geniş bir yelpazedeki uygulamaları olumlu etkiler. Bu makalede Duff aygıtı olarak bilinen döngü açılımının SpMV’nin başarımına etkisini irdeliyoruz. Önerdiğimiz Duff aygıtı tabanlı SpMV gerçeklemesi, en geçerli seyrek matris saklama formatı olan CSR formatının düşük maliyetli bir ön işlemesi sonrası kullanılabilmektedir. Gerçek problemlerde kullanılan matrislerden oluşan veri kümesi ile deneysel bir değerlendirme yaptık ve önemli derecede hızlanma kaydedilebileceğini gözlemledik.

References

  • Williams S, Oliker L, Vuduc R, Shalf J, Yelick K, Demmel J. Optimization of Sparse Matrix-vector Multiplication on Emerging Multicore Platforms. Parallel Comput 2009;35:178–94. doi:10.1016/j.parco.2008.12.006.
  • Filippone S, Cardellini V, Barbieri D, Fanfarillo A. Sparse Matrix-Vector Multiplication on GPGPUs. ACM Trans Math Softw 2017;43:30:1--30:49. doi:10.1145/3017994.
  • Langr D, Tvrdik P. Evaluation Criteria for Sparse Matrix Storage Formats. IEEE Trans Parallel Distrib Syst 2016;27:428–40. doi:10.1109/TPDS.2015.2401575.
  • Goumas GI, Kourtis K, Anastopoulos N, Karakasis V, Koziris N. Understanding the Performance of Sparse Matrix-Vector Multiplication. 16th Euromicro Int. Conf. Parallel, Distrib. Network-Based Process., 2008, p. 283–92. doi:10.1109/PDP.2008.41.
  • Liu X, Smelyanskiy M, Chow E, Dubey P. Efficient Sparse Matrix-vector Multiplication on x86-based Many-core Processors. Proc. 27th Int. ACM Conf. Int. Conf. Supercomput., New York, NY, USA: ACM; 2013, p. 273–82. doi:10.1145/2464996.2465013.
  • Belgin M, Back G, Ribbens CJ. A Library for Pattern-based Sparse Matrix Vector Multiply. Int J Parallel Program 2011;39:62–87. doi:10.1007/s10766-010-0145-2.
  • Liu W, Vinter B. CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication. Proc. 29th ACM Int. Conf. Supercomput., New York, NY, USA: ACM; 2015, p. 339–50. doi:10.1145/2751205.2751209.
  • Vuduc R, Demmel JW, Yelick KA. OSKI: A library of automatically tuned sparse matrix kernels. J Phys Conf Ser 2005;16:521. doi:10.1088/1742-6596/16/1/071.
  • Bell N, Garland M. Implementing Sparse Matrix-vector Multiplication on Throughput-oriented Processors. Proc. Conf. High Perform. Comput. Networking, Storage Anal., New York, NY, USA: ACM; 2009, p. 18:1--18:11. doi:10.1145/1654059.1654078.
  • Yzelman AN, Roose D. High-Level Strategies for Parallel Shared-Memory Sparse Matrix-Vector Multiplication. IEEE Trans Parallel Distrib Syst 2014;25:116–25. doi:10.1109/TPDS.2013.31.
  • Saad Y. Iterative Methods for Sparse Linear Systems. SIAM; 2003. doi:10.1137/1.9780898718003.
  • Greathouse JL, Daga M. Efficient Sparse Matrix-vector Multiplication on GPUs Using the CSR Storage Format. Int. Conf. High Perform. Comput. Networking, Storage Anal. (SC ’14), 2014, p. 769–80. doi:10.1109/SC.2014.68.
  • Ashari A, Sedaghati N, Eisenlohr J, Parthasarathy S, Sadayappan P. Fast Sparse Matrix-vector Multiplication on GPUs for Graph Applications. Int. Conf. High Perform. Comput. Networking, Storage Anal. (SC ’14), 2014, p. 781–92. doi:10.1109/SC.2014.69.
  • Merrill D, Garland M. Merge-based Parallel Sparse Matrix-vector Multiplication. Int. Conf. High Perform. Comput. Networking, Storage Anal. (SC ’16), 2016, p. 58:1--58:12. doi:10.1109/SC.2016.57.
  • Liu Y, Schmidt B. LightSpMV: Faster CSR-based sparse matrix-vector multiplication on CUDA-enabled GPUs. Proc. Int. Conf. Appl. Syst. Archit. Process., vol. 2015–Septe, 2015, p. 82–9. doi:10.1109/ASAP.2015.7245713.
  • Ohshima S, Katagiri T, Matsumoto M. Performance Optimization of SpMV Using CRS Format by Considering OpenMP Scheduling on CPUs and MIC. IEEE Int. Symp. Embed. Multicore/Manycore SoCs, 2014, p. 253–60. doi:10.1109/MCSoC.2014.43.
  • Liu L, Liu M, Wang C, Wang J. LSRB-CSR: A low overhead storage format for SpMV on the GPU systems. Proc. Int. Conf. Parallel Distrib. Syst. - ICPADS, vol. 2016–Janua, 2016, p. 733–41. doi:10.1109/ICPADS.2015.97.
  • Duff T. Duff’s Device 1988. http://www.lysator.liu.se/c/duffs-device.html.
  • Youssefi A. Exploring the Potential for Accelerating Sparse Matrix-Vector Product on a Processing-in-Memory Architecture. Rice University, 2008.
  • Mellor-Crummey J, Garvin J. Optimizing Sparse Matrix-Vector Product Computations Using Unroll and Jam. Int J High Perform Comput Appl 2004;18:225–36. doi:10.1177/1094342004038951.
  • Koster J. Parallel Templates for Numerical Linear Algebra, A High-Performance Computation Library. Utrecht University, 2002.
  • Davis TA, Hu Y. The University of Florida Sparse Matrix Collection. ACM Trans Math Softw 2011;38:1:1--1:25. doi:10.1145/2049662.2049663.
  • Kourtis K, Goumas G, Koziris N. Exploiting Compression Opportunities to Improve SpMxV Performance on Shared Memory Systems. ACM Trans Arch Code Optim 2010;7:16:1--16:31. doi:10.1145/1880037.1880041.
  • Karakasis V, Gkountouvas T, Kourtis K, Goumas G, Koziris N. An Extended Compression Format for the Optimization of Sparse Matrix-Vector Multiplication. IEEE Trans Parallel Distrib Syst 2013;24:1930–40. doi:10.1109/TPDS.2012.290.
  • Kamin S, Garzarán MJ, Aktemur B, Xu D, Yılmaz B, Chen Z. Optimization by Runtime Specialization for Sparse Matrix-vector Multiplication. Gener. Program. Concepts Exp. (GPCE ’14), 2014, p. 93–102. doi:10.1145/2658761.2658773.
  • Yılmaz B, Aktemur B, Garzarán MJ, Kamin S, Kıraç F. Autotuning Runtime Specialization for Sparse Matrix-Vector Multiplication. ACM Trans Arch Code Optim 2016;13:5:1--5:26. doi:10.1145/2851500.
  • Aktemur B. A sparse matrix-vector multiplication method with low preprocessing cost. Concurr Comput Pract Exp 2018;30:e4701. doi:10.1002/cpe.4701.
  • Yang W, Li K, Mo Z, Li K. Performance Optimization Using Partitioned SpMV on GPUs and Multicore CPUs. IEEE Trans Comput 2015;64:2623–36. doi:10.1109/TC.2014.2366731.
  • Yzelman AN, Bisseling RH. Cache-Oblivious Sparse Matrix--Vector Multiplication by Using Sparse Matrix Partitioning Methods. SIAM J Sci Comput 2009;31:3128–54. doi:10.1137/080733243.
  • Martone M. Efficient multithreaded untransposed, transposed or symmetric sparse matrix--vector multiplication with the Recursive Sparse Blocks format. Parallel Comput 2014;40:251–70. doi:10.1016/j.parco.2014.03.008.
  • Çatalyürek Ü V., Aykanat C. Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication. IEEE Trans Parallel Distrib Syst 1999;10:673–93. doi:10.1109/71.780863.
  • Willcock J, Lumsdaine A. Accelerating Sparse Matrix Computations via Data Compression. Proc. 20th Annu. Int. Conf. Supercomput., New York, NY, USA: ACM; 2006, p. 307–16. doi:10.1145/1183401.1183444.

Sparse Matrix-Vector Multiplication Based on Duff’s Device

Year 2020, , 315 - 324, 15.05.2020
https://doi.org/10.21205/deufmd.2020226501

Abstract

Sparse
matrix-vector multiplication (SpMV) is used frequently in several engineering
problems and scientific computation. Optimizing SpMV positively impacts a wide
range of applications. In this paper, we investigate the effect of a
loop-unrolling method known as Duff’s device on the performance of SpMV. The
Duff’s device-based SpMV implementation that we propose can be used after a
low-cost preprocessing of the CSR representation – the most common sparse
matrix storage format – of a matrix. We have evaluated the approach on a
dataset consisting of matrices from real-world problems, and observed that
substantial speedup can be achieved.

References

  • Williams S, Oliker L, Vuduc R, Shalf J, Yelick K, Demmel J. Optimization of Sparse Matrix-vector Multiplication on Emerging Multicore Platforms. Parallel Comput 2009;35:178–94. doi:10.1016/j.parco.2008.12.006.
  • Filippone S, Cardellini V, Barbieri D, Fanfarillo A. Sparse Matrix-Vector Multiplication on GPGPUs. ACM Trans Math Softw 2017;43:30:1--30:49. doi:10.1145/3017994.
  • Langr D, Tvrdik P. Evaluation Criteria for Sparse Matrix Storage Formats. IEEE Trans Parallel Distrib Syst 2016;27:428–40. doi:10.1109/TPDS.2015.2401575.
  • Goumas GI, Kourtis K, Anastopoulos N, Karakasis V, Koziris N. Understanding the Performance of Sparse Matrix-Vector Multiplication. 16th Euromicro Int. Conf. Parallel, Distrib. Network-Based Process., 2008, p. 283–92. doi:10.1109/PDP.2008.41.
  • Liu X, Smelyanskiy M, Chow E, Dubey P. Efficient Sparse Matrix-vector Multiplication on x86-based Many-core Processors. Proc. 27th Int. ACM Conf. Int. Conf. Supercomput., New York, NY, USA: ACM; 2013, p. 273–82. doi:10.1145/2464996.2465013.
  • Belgin M, Back G, Ribbens CJ. A Library for Pattern-based Sparse Matrix Vector Multiply. Int J Parallel Program 2011;39:62–87. doi:10.1007/s10766-010-0145-2.
  • Liu W, Vinter B. CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication. Proc. 29th ACM Int. Conf. Supercomput., New York, NY, USA: ACM; 2015, p. 339–50. doi:10.1145/2751205.2751209.
  • Vuduc R, Demmel JW, Yelick KA. OSKI: A library of automatically tuned sparse matrix kernels. J Phys Conf Ser 2005;16:521. doi:10.1088/1742-6596/16/1/071.
  • Bell N, Garland M. Implementing Sparse Matrix-vector Multiplication on Throughput-oriented Processors. Proc. Conf. High Perform. Comput. Networking, Storage Anal., New York, NY, USA: ACM; 2009, p. 18:1--18:11. doi:10.1145/1654059.1654078.
  • Yzelman AN, Roose D. High-Level Strategies for Parallel Shared-Memory Sparse Matrix-Vector Multiplication. IEEE Trans Parallel Distrib Syst 2014;25:116–25. doi:10.1109/TPDS.2013.31.
  • Saad Y. Iterative Methods for Sparse Linear Systems. SIAM; 2003. doi:10.1137/1.9780898718003.
  • Greathouse JL, Daga M. Efficient Sparse Matrix-vector Multiplication on GPUs Using the CSR Storage Format. Int. Conf. High Perform. Comput. Networking, Storage Anal. (SC ’14), 2014, p. 769–80. doi:10.1109/SC.2014.68.
  • Ashari A, Sedaghati N, Eisenlohr J, Parthasarathy S, Sadayappan P. Fast Sparse Matrix-vector Multiplication on GPUs for Graph Applications. Int. Conf. High Perform. Comput. Networking, Storage Anal. (SC ’14), 2014, p. 781–92. doi:10.1109/SC.2014.69.
  • Merrill D, Garland M. Merge-based Parallel Sparse Matrix-vector Multiplication. Int. Conf. High Perform. Comput. Networking, Storage Anal. (SC ’16), 2016, p. 58:1--58:12. doi:10.1109/SC.2016.57.
  • Liu Y, Schmidt B. LightSpMV: Faster CSR-based sparse matrix-vector multiplication on CUDA-enabled GPUs. Proc. Int. Conf. Appl. Syst. Archit. Process., vol. 2015–Septe, 2015, p. 82–9. doi:10.1109/ASAP.2015.7245713.
  • Ohshima S, Katagiri T, Matsumoto M. Performance Optimization of SpMV Using CRS Format by Considering OpenMP Scheduling on CPUs and MIC. IEEE Int. Symp. Embed. Multicore/Manycore SoCs, 2014, p. 253–60. doi:10.1109/MCSoC.2014.43.
  • Liu L, Liu M, Wang C, Wang J. LSRB-CSR: A low overhead storage format for SpMV on the GPU systems. Proc. Int. Conf. Parallel Distrib. Syst. - ICPADS, vol. 2016–Janua, 2016, p. 733–41. doi:10.1109/ICPADS.2015.97.
  • Duff T. Duff’s Device 1988. http://www.lysator.liu.se/c/duffs-device.html.
  • Youssefi A. Exploring the Potential for Accelerating Sparse Matrix-Vector Product on a Processing-in-Memory Architecture. Rice University, 2008.
  • Mellor-Crummey J, Garvin J. Optimizing Sparse Matrix-Vector Product Computations Using Unroll and Jam. Int J High Perform Comput Appl 2004;18:225–36. doi:10.1177/1094342004038951.
  • Koster J. Parallel Templates for Numerical Linear Algebra, A High-Performance Computation Library. Utrecht University, 2002.
  • Davis TA, Hu Y. The University of Florida Sparse Matrix Collection. ACM Trans Math Softw 2011;38:1:1--1:25. doi:10.1145/2049662.2049663.
  • Kourtis K, Goumas G, Koziris N. Exploiting Compression Opportunities to Improve SpMxV Performance on Shared Memory Systems. ACM Trans Arch Code Optim 2010;7:16:1--16:31. doi:10.1145/1880037.1880041.
  • Karakasis V, Gkountouvas T, Kourtis K, Goumas G, Koziris N. An Extended Compression Format for the Optimization of Sparse Matrix-Vector Multiplication. IEEE Trans Parallel Distrib Syst 2013;24:1930–40. doi:10.1109/TPDS.2012.290.
  • Kamin S, Garzarán MJ, Aktemur B, Xu D, Yılmaz B, Chen Z. Optimization by Runtime Specialization for Sparse Matrix-vector Multiplication. Gener. Program. Concepts Exp. (GPCE ’14), 2014, p. 93–102. doi:10.1145/2658761.2658773.
  • Yılmaz B, Aktemur B, Garzarán MJ, Kamin S, Kıraç F. Autotuning Runtime Specialization for Sparse Matrix-Vector Multiplication. ACM Trans Arch Code Optim 2016;13:5:1--5:26. doi:10.1145/2851500.
  • Aktemur B. A sparse matrix-vector multiplication method with low preprocessing cost. Concurr Comput Pract Exp 2018;30:e4701. doi:10.1002/cpe.4701.
  • Yang W, Li K, Mo Z, Li K. Performance Optimization Using Partitioned SpMV on GPUs and Multicore CPUs. IEEE Trans Comput 2015;64:2623–36. doi:10.1109/TC.2014.2366731.
  • Yzelman AN, Bisseling RH. Cache-Oblivious Sparse Matrix--Vector Multiplication by Using Sparse Matrix Partitioning Methods. SIAM J Sci Comput 2009;31:3128–54. doi:10.1137/080733243.
  • Martone M. Efficient multithreaded untransposed, transposed or symmetric sparse matrix--vector multiplication with the Recursive Sparse Blocks format. Parallel Comput 2014;40:251–70. doi:10.1016/j.parco.2014.03.008.
  • Çatalyürek Ü V., Aykanat C. Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication. IEEE Trans Parallel Distrib Syst 1999;10:673–93. doi:10.1109/71.780863.
  • Willcock J, Lumsdaine A. Accelerating Sparse Matrix Computations via Data Compression. Proc. 20th Annu. Int. Conf. Supercomput., New York, NY, USA: ACM; 2006, p. 307–16. doi:10.1145/1183401.1183444.
There are 32 citations in total.

Details

Primary Language Turkish
Subjects Engineering
Journal Section Research Article
Authors

Barış Aktemur 0000-0002-1414-9338

Publication Date May 15, 2020
Published in Issue Year 2020

Cite

APA Aktemur, B. (2020). Duff Aygıtı Tabanlı Seyrek Matris-Vektör Çarpımı. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen Ve Mühendislik Dergisi, 22(65), 315-324. https://doi.org/10.21205/deufmd.2020226501
AMA Aktemur B. Duff Aygıtı Tabanlı Seyrek Matris-Vektör Çarpımı. DEUFMD. May 2020;22(65):315-324. doi:10.21205/deufmd.2020226501
Chicago Aktemur, Barış. “Duff Aygıtı Tabanlı Seyrek Matris-Vektör Çarpımı”. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen Ve Mühendislik Dergisi 22, no. 65 (May 2020): 315-24. https://doi.org/10.21205/deufmd.2020226501.
EndNote Aktemur B (May 1, 2020) Duff Aygıtı Tabanlı Seyrek Matris-Vektör Çarpımı. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi 22 65 315–324.
IEEE B. Aktemur, “Duff Aygıtı Tabanlı Seyrek Matris-Vektör Çarpımı”, DEUFMD, vol. 22, no. 65, pp. 315–324, 2020, doi: 10.21205/deufmd.2020226501.
ISNAD Aktemur, Barış. “Duff Aygıtı Tabanlı Seyrek Matris-Vektör Çarpımı”. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi 22/65 (May 2020), 315-324. https://doi.org/10.21205/deufmd.2020226501.
JAMA Aktemur B. Duff Aygıtı Tabanlı Seyrek Matris-Vektör Çarpımı. DEUFMD. 2020;22:315–324.
MLA Aktemur, Barış. “Duff Aygıtı Tabanlı Seyrek Matris-Vektör Çarpımı”. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen Ve Mühendislik Dergisi, vol. 22, no. 65, 2020, pp. 315-24, doi:10.21205/deufmd.2020226501.
Vancouver Aktemur B. Duff Aygıtı Tabanlı Seyrek Matris-Vektör Çarpımı. DEUFMD. 2020;22(65):315-24.

Dokuz Eylül Üniversitesi, Mühendislik Fakültesi Dekanlığı Tınaztepe Yerleşkesi, Adatepe Mah. Doğuş Cad. No: 207-I / 35390 Buca-İZMİR.