Alternative CPU and GPU Parallel Computing Approaches for Improving Sequential Analysis of Probability Associations in Short Texts

Dima Alnahas; Ahmet Aydin

doi:10.17694/bajece.1069152

Research Article

Alternative CPU and GPU Parallel Computing Approaches for Improving Sequential Analysis of Probability Associations in Short Texts

Year 2022, Volume: 10 Issue: 4, 419 - 428, 19.10.2022

Dima Alnahas , Ahmet Aydin

https://doi.org/10.17694/bajece.1069152

https://izlik.org/JA93YK26LW

Abstract

In linguistics, probabilistic relation between co-occurrent words can provide useful interpretation of knowledge conveyed in a text. Connectivity patterns of vectorized representation of lexemes can be identified by using bigram models of word sequences. Similarity assessment of these patterns is performed by applying cosine similarity and mean squared error measures on word vectors of probabilistic relation matrix of text. Moreover, parallel computing is another important aspect for various domains that enables fast data processing and analytics. In this paper, we aim to demonstrate the benefit of parallel computing for computational challenges of extracting probabilistic relations between lexemes. In this study, we have explored performance limitations of sequential semantic similarity analysis and then implemented CPU and GPU parallel versions to show benefits of multicore CPU-GPU utilization for computationally demanding applications. Our results indicate that the alternative parallel computing implementations can be used to significantly enhance performance and applicability of probabilistic relation graph models in linguistic analyses.

Keywords

References

[1] A. A. Aydin and G. Alaghband, “Sequential and parallel hybrid approach for nonrecursive most significant digit radix sort,” in 10th International Conference on Applied Computing, 2013, pp. 51–58.
[2] S. Berkovich and E. Berkovich, “Methods and apparatus for concurrent execution of serial computing instructions using combinatorial architecture for program partitioning,” Apr. 8 1997, uS Patent 5,619,680.
[3] A. A. Aydin, “Performance benchmarking of sequential, parallel and hybrid radix sort algorithms and analyzing impact of sub vectors, created on each level,on hybrid msd radix sort’s runtime,” 2012, mS Thesis, University of Colorado Denver.
[4] B. Parhami, “Parallel processing with big data.” 2019.
[5] D. Demirol, R. Das, and D. Hanbay, “B¨uy¨uk veri ¨uzerine perspektif bir bakıs¸,” in 2019 International Artificial Intelligence and Data Processing Symposium (IDAP). IEEE, 2019, pp. 1–9.
[6] J. Hromkoviˇc, Communication complexity and parallel computing. Springer Science & Business Media, 2013.
[7] A. Aydin and K. Anderson, “Batch to real-time: Incremental data collection & analytics platform,” 2017.
[8] S. H. Roosta, “Artificial intelligence and parallel processing,” in Parallel Processing and Parallel Algorithms. Springer, 2000, pp. 501-534.
[9] T. Strzalkowski, F. Lin, J. Wang, and J. Perez-Carballo, “Evaluating natural language processing techniques in information retrieval,” in Natural language information retrieval. Springer, 1999, pp. 113–145.
[10] S. Gupta and M. R. Babu, “Performance analysis of gpu compared to single-core and multi-core cpu for natural language applications,” IJACSA Editorial, 2011.
[11] D. Alnahas and B. B. Alagoz, “Probabilistic relational connectivity analysis of bigram models,” in 2019 International Artificial Intelligence and Data Processing Symposium (IDAP). IEEE, 2019, pp. 1–6.
[12] I. Dagan, L. Lee, and F. C. Pereira, “Similarity-based models of word cooccurrence probabilities,” Machine learning, vol. 34, no. 1, pp. 43–69, 1999.
[13] A. M. Schakel and B. J. Wilson, “Measuring word significance using distributed representations of words,” arXiv preprint arXiv:1508.02297, 2015.
[14] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
[15] Y. Yin, D. Feng, Z. Shi, and L. Ouyang, “Text recommendation based on time series and multi-label information,” 2020.
[16] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” arXiv preprint arXiv:1310.4546, 2013.
[17] S. Zhou, X. Xu, Y. Liu, R. Chang, and Y. Xiao, “Text similarity measurement of semantic cognition based on word vector distance decentralization with clustering analysis,” IEEE Access, vol. 7, pp. 107 247–107 258, 2019.
[18] E. Strubell, A. Ganesh, and A. McCallum, “Energy and policy considerations for deep learning in nlp,” arXiv preprint arXiv:1906.02243, 2019.
[19] R. Raina, A. Madhavan, and A. Y. Ng, “Large-scale deep unsupervised learning using graphics processors,” in Proceedings of the 26th annual international conference on machine learning, 2009, pp. 873–880.
[20] E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym, “Nvidia tesla: A unified graphics and computing architecture,” IEEE micro, vol. 28, no. 2, pp. 39–55, 2008.
[21] J. Nickolls, I. Buck, M. Garland, and K. Skadron, “Scalable parallel programming with cuda: Is cuda the parallel programming model that application developers have been waiting for?” Queue, vol. 6, no. 2, pp. 40–53, 2008.
[22] J. Schler, M. Koppel, S. Argamon, and J. Pennebaker, “Effects of age and gender on blogging. aaai spring symposium on computational approaches for analyzing weblogs,” 2006.

Year 2022, Volume: 10 Issue: 4, 419 - 428, 19.10.2022

Dima Alnahas , Ahmet Aydin

https://doi.org/10.17694/bajece.1069152

https://izlik.org/JA93YK26LW

Abstract

References

[1] A. A. Aydin and G. Alaghband, “Sequential and parallel hybrid approach for nonrecursive most significant digit radix sort,” in 10th International Conference on Applied Computing, 2013, pp. 51–58.
[2] S. Berkovich and E. Berkovich, “Methods and apparatus for concurrent execution of serial computing instructions using combinatorial architecture for program partitioning,” Apr. 8 1997, uS Patent 5,619,680.
[3] A. A. Aydin, “Performance benchmarking of sequential, parallel and hybrid radix sort algorithms and analyzing impact of sub vectors, created on each level,on hybrid msd radix sort’s runtime,” 2012, mS Thesis, University of Colorado Denver.
[4] B. Parhami, “Parallel processing with big data.” 2019.
[5] D. Demirol, R. Das, and D. Hanbay, “B¨uy¨uk veri ¨uzerine perspektif bir bakıs¸,” in 2019 International Artificial Intelligence and Data Processing Symposium (IDAP). IEEE, 2019, pp. 1–9.
[6] J. Hromkoviˇc, Communication complexity and parallel computing. Springer Science & Business Media, 2013.
[7] A. Aydin and K. Anderson, “Batch to real-time: Incremental data collection & analytics platform,” 2017.
[8] S. H. Roosta, “Artificial intelligence and parallel processing,” in Parallel Processing and Parallel Algorithms. Springer, 2000, pp. 501-534.
[9] T. Strzalkowski, F. Lin, J. Wang, and J. Perez-Carballo, “Evaluating natural language processing techniques in information retrieval,” in Natural language information retrieval. Springer, 1999, pp. 113–145.
[10] S. Gupta and M. R. Babu, “Performance analysis of gpu compared to single-core and multi-core cpu for natural language applications,” IJACSA Editorial, 2011.
[11] D. Alnahas and B. B. Alagoz, “Probabilistic relational connectivity analysis of bigram models,” in 2019 International Artificial Intelligence and Data Processing Symposium (IDAP). IEEE, 2019, pp. 1–6.
[12] I. Dagan, L. Lee, and F. C. Pereira, “Similarity-based models of word cooccurrence probabilities,” Machine learning, vol. 34, no. 1, pp. 43–69, 1999.
[13] A. M. Schakel and B. J. Wilson, “Measuring word significance using distributed representations of words,” arXiv preprint arXiv:1508.02297, 2015.
[14] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
[15] Y. Yin, D. Feng, Z. Shi, and L. Ouyang, “Text recommendation based on time series and multi-label information,” 2020.
[16] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” arXiv preprint arXiv:1310.4546, 2013.
[17] S. Zhou, X. Xu, Y. Liu, R. Chang, and Y. Xiao, “Text similarity measurement of semantic cognition based on word vector distance decentralization with clustering analysis,” IEEE Access, vol. 7, pp. 107 247–107 258, 2019.
[18] E. Strubell, A. Ganesh, and A. McCallum, “Energy and policy considerations for deep learning in nlp,” arXiv preprint arXiv:1906.02243, 2019.
[19] R. Raina, A. Madhavan, and A. Y. Ng, “Large-scale deep unsupervised learning using graphics processors,” in Proceedings of the 26th annual international conference on machine learning, 2009, pp. 873–880.
[20] E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym, “Nvidia tesla: A unified graphics and computing architecture,” IEEE micro, vol. 28, no. 2, pp. 39–55, 2008.
[21] J. Nickolls, I. Buck, M. Garland, and K. Skadron, “Scalable parallel programming with cuda: Is cuda the parallel programming model that application developers have been waiting for?” Queue, vol. 6, no. 2, pp. 40–53, 2008.
[22] J. Schler, M. Koppel, S. Argamon, and J. Pennebaker, “Effects of age and gender on blogging. aaai spring symposium on computational approaches for analyzing weblogs,” 2006.

There are 22 citations in total.

Details

Primary Language	English
Subjects	Software Testing, Verification and Validation
Journal Section	Research Article
Authors	Dima Alnahas 0000-0002-6046-1066 Ahmet Aydin 0000-0002-4124-7275
Publication Date	October 19, 2022
DOI	https://doi.org/10.17694/bajece.1069152
IZ	https://izlik.org/JA93YK26LW
Published in Issue	Year 2022 Volume: 10 Issue: 4

Cite

APA	Alnahas, D., & Aydin, A. (2022). Alternative CPU and GPU Parallel Computing Approaches for Improving Sequential Analysis of Probability Associations in Short Texts. Balkan Journal of Electrical and Computer Engineering, 10(4), 419-428. https://doi.org/10.17694/bajece.1069152

Article Files

Full Text

All articles published by BAJECE are licensed under the Creative Commons Attribution 4.0 International License. This permits anyone to copy, redistribute, remix, transmit and adapt the work provided the original work and source is appropriately cited. Creative Commons LisansÄ±