Research Article
BibTex RIS Cite

Development of an application, infrastructure, and network observability platform

Year 2024, Volume: 6 Issue: 1, 22 - 30

Abstract

Observability refers to the degree to which the internal states of a system can be predicted by looking at its outputs. The observability of a system ensures that all internal states of that system can be precisely determined by external measurements. Application, infrastructure and network observability plays a critical role in control theory and automatic control systems design. This feature determines whether the system can be controlled or not. In this study, a platform has been developed that collects observability data of systems such as applications, infrastructure and networks through a single platform and enables instant observation, association, standardization and analysis in order to understand events and non-standard situations that negatively affect the performance of all technology layers and to provide actionable smart notifications. In this way, instead of using multiple alternatives and only partially functional platforms, all observability needs are met with a single platform and an integrated observability experience is offered. Within the scope of the platform, it is an approach that receives relevant telemetry (log, metric, trace, event) data from various systems, records and stores it, visualizes and reports it, generates alarms in certain situations, supports it with smart alarm methods, and is used to determine and solve the root cause of a problem. Tools, web services, user interfaces and machine learning models have been developed to easily find the root causes of problems through cause analysis. This platform offers its users an integrated observability solution. Additionally, it has been observed that the number of observability platform users increased by 30% with the developed platform.

References

  • Alimohammadi, H., Chen, S. N. (2022). Performance Evaluation Of Outlier Detection Techniques İn Production Timeseries: A Systematic Review And Meta-Analysis. Expert Systems With Applications, 191: 116371.
  • Bahri, M., Salutari, F., Putina, A., Sozio, M. (2022). Automl: State Of The Art With A Focus On Anomaly Detection, Challenges, And Research Directions. International Journal Of Data Science And Analytics, 14(2): 113-126.
  • Elsner, D., Aleatrati Khosroshahi, P., Maccormack, A. D., Lagerström, R. (2019). Multivariate Unsupervised Machine Learning For Anomaly Detection İn Enterprise Applications. In Proceedings Of The 52nd Hawaii International Conference On System Sciences, Jan 1, 2019, Bildiriler Kitabı, Pp. 5827-5836.
  • Emamjome, F., Andrews, R., Ter Hofstede, A., Reijers, H. (2020). Alohomora: Unlocking Data Quality Causes Through Event Log Context. In Proceedings Of The 28th European Conference On Information Systems, Jun 15 - 17, 2020, Bildiriler Kitabı, Pp. 1-16.
  • Gomez Blanco, D. (2023). Adopting Observability. In Practical OpenTelemetry: Adopting Open Observability Standards Across Your Organization, CA, Berkeley, pp. 217-229.
  • Hagemann, T., Katsarou, K. (2020). Reconstruction-Based Anomaly Detection For The Cloud: A Comparison On The Yahoo! Webscope S5 Dataset. In Proceedings Of The 2020 4th International Conference On Cloud And Big Data Computing, Aug 26 - 28, 2020, Birleşik Krallık,  Bildiriler Kitabı, Pp. 68-75.
  • Hashemnia, N., Fan, Y., Rocha, N. (2021). Using Machine Learning To Predict And Avoid Malfunctions: A Revolutionary Concept For Condition-Based Asset Performance Management (Apm). In 2021 IEEE PES Innovative Smart Grid Technologies-Asia, Dec 5 - 8, 2021, Brisbane, Avustralya, Bildiriler Kitabı, Pp. 1-8.
  • Khaled, A. S., Sharma, D. K., Yashwanth, T., Reddy, V. M. K., Doewes, R. I., Naved, M. (2022). Evaluating The Role Of Robotics, Machine Learning And Artificial Intelligence İn The Field Of Performance Management. In Proceedings Of Second International Conference İn Mechanical And Energy Technology, 2021, India,  Bildiriler Kitabı, Pp. 285-293.
  • Manchanda, S. (2021). Artificial Intelligence Driven Monitoring, Prediction And Recommendation System (AIM-PRISM). In Intelligent Sustainable Systems: Selected Papers Of Worlds4 2021, Dec 17, 2021, Bildiriler Kitabı,  Pp. 409-421. O’Leary, C., Toosi, F. G., & Lynch, C. (2023). A Review of AutoML Software Tools for Time Series Forecasting and Anomaly Detection. ICAART, (3): 421-433.
  • Onodueze, F., Josyula, D. (2020). Anomaly Detection On MIL-STD-1553 Dataset Using Machine Learning Algorithms. In 2020 IEEE 19th International Conference On Trust, Security And Privacy İn Computing And Communications, Dec 29 – Jan 1, 2021, Guangzhou, Çin, Bildiriler Kitabı, Pp. 592-598.
  • Ozer, G., Netti, A., Tafani, D., Schulz, M. (2020). Characterizing HPC Performance Variation With Monitoring And Unsupervised Learning. In High Performance Computing: ISC High Performance 2020 International Workshops, Jun 21–25, 2020, Frankfurt, Germany, Bildiriler Kitabı, Pp. 280-292.
  • Pintilie, I., Manolache, A., Brad, F. (2023). Time Series Anomaly Detection Using Diffusion-Based Models. In 2023 IEEE International Conference On Data Mining Workshops, Aralık 4, 2023, Şanghay, Çin, Bildiriler Kitabı, Pp. 570-578.
  • Princz, G., Shaloo, M., & Erol, S. (2024). Anomaly Detection in Binary Time Series Data: An unsupervised Machine Learning Approach for Condition Monitoring. Procedia Computer Science, 232: 1065-1078.
  • Qiu, J., Du, Q., Qian, C. (2019). Kpi-Tsad: A Time-Series Anomaly Detector For Kpi Monitoring İn Cloud Applications. Symmetry, 11(11): 1350.
  • Tang, Y., Wang, H., Zhan, X., Luo, X., Zhou, Y., Zhou, H., Keung, J. (2021). A Systematical Study On Application Performance Management Libraries For Apps. IEEE Transactions On Software Engineering, 48(8): 3044-3065.
  • Zhou, X., Peng, X., Xie, T., Sun, J., Ji, C., Liu, D., He, C. (2019). Latent Error Prediction And Fault Localization For Microservice Applications By Learning From System Trace Logs. In Proceedings Of The 2019 27th ACM Joint Meeting On European Software Engineering Conference And Symposium On The Foundations Of Software Engineering, Aug 26 - 30, 2019, Tallin, Estonya, Bildiriler Kitabı, Pp. 683-694.

Uygulama, altyapı ve ağ gözlenebilirlik platformunun geliştirilmesi

Year 2024, Volume: 6 Issue: 1, 22 - 30

Abstract

Gözlenebilirlik, bir sistemin iç durumlarının, çıktılarına bakılarak tahmin edilme derecesini ifade eder. Bir sistemin gözlenebilir olması, o sistemin tüm iç durumlarının dışarıdan alınan ölçümlerle tam olarak belirlenebilmesini sağlar. Uygulama, altyapı ve ağ gözlenebilirliği, kontrol teorisi ve otomatik kontrol sistemleri tasarımında kritik bir rol oynar. Bu özellik, sistemin kontrol edilip edilemeyeceğini belirler. Bu çalışmada, tüm teknoloji katmanlarının performansını olumsuz etkileyen olayları ve standart dışı durumları anlamak ve eyleme dönüştürülebilir akıllı bildiriler sağlamak için uygulama, altyapı ve ağ gibi sistemlerin gözlenebilirlik verilerini tek bir platform aracılığıyla toplayan, anlık gözlenmesini, ilişkilendirilmesini, standartlaştırılmasını ve analiz edilmesini sağlayan bir platform geliştirilmiştir. Bu sayede, kullanılan birden fazla alternatif ve sadece kısmi işlev gören platformlar yerine, tek bir platform ile tüm gözlenebilirlik ihtiyaçları karşılanmış ve bütünleşik bir gözlenebilirlik deneyimi sunulmuştur. Platform kapsamında, çeşitli sistemlerden ilgili telemetri (log, metrik, iz, olay) verilerini alan, kaydedip saklayan, görselleştirip raporlanmasını sağlayan, belli durumlarda alarm üreten, akıllı alarm yöntemleri ile destekleyen, bir sorunun temel nedenini belirlemek ve çözmek amacıyla kullanılan bir yaklaşım olan kök neden analizi ile sorunların kök nedenlerini kolayca bulmayı sağlayan araçlar, web servisleri, kullanıcı arayüzleri ve makine öğrenme modelleri geliştirilmiştir. Bu platform, kullanıcılarına bütünleşik bir gözlenebilirlik çözümü sunmaktadır. Ayrıca, geliştirilen platform ile gözlenebilirlik platformu kullanıcı sayısının %30 arttığı gözlenmiştir.

References

  • Alimohammadi, H., Chen, S. N. (2022). Performance Evaluation Of Outlier Detection Techniques İn Production Timeseries: A Systematic Review And Meta-Analysis. Expert Systems With Applications, 191: 116371.
  • Bahri, M., Salutari, F., Putina, A., Sozio, M. (2022). Automl: State Of The Art With A Focus On Anomaly Detection, Challenges, And Research Directions. International Journal Of Data Science And Analytics, 14(2): 113-126.
  • Elsner, D., Aleatrati Khosroshahi, P., Maccormack, A. D., Lagerström, R. (2019). Multivariate Unsupervised Machine Learning For Anomaly Detection İn Enterprise Applications. In Proceedings Of The 52nd Hawaii International Conference On System Sciences, Jan 1, 2019, Bildiriler Kitabı, Pp. 5827-5836.
  • Emamjome, F., Andrews, R., Ter Hofstede, A., Reijers, H. (2020). Alohomora: Unlocking Data Quality Causes Through Event Log Context. In Proceedings Of The 28th European Conference On Information Systems, Jun 15 - 17, 2020, Bildiriler Kitabı, Pp. 1-16.
  • Gomez Blanco, D. (2023). Adopting Observability. In Practical OpenTelemetry: Adopting Open Observability Standards Across Your Organization, CA, Berkeley, pp. 217-229.
  • Hagemann, T., Katsarou, K. (2020). Reconstruction-Based Anomaly Detection For The Cloud: A Comparison On The Yahoo! Webscope S5 Dataset. In Proceedings Of The 2020 4th International Conference On Cloud And Big Data Computing, Aug 26 - 28, 2020, Birleşik Krallık,  Bildiriler Kitabı, Pp. 68-75.
  • Hashemnia, N., Fan, Y., Rocha, N. (2021). Using Machine Learning To Predict And Avoid Malfunctions: A Revolutionary Concept For Condition-Based Asset Performance Management (Apm). In 2021 IEEE PES Innovative Smart Grid Technologies-Asia, Dec 5 - 8, 2021, Brisbane, Avustralya, Bildiriler Kitabı, Pp. 1-8.
  • Khaled, A. S., Sharma, D. K., Yashwanth, T., Reddy, V. M. K., Doewes, R. I., Naved, M. (2022). Evaluating The Role Of Robotics, Machine Learning And Artificial Intelligence İn The Field Of Performance Management. In Proceedings Of Second International Conference İn Mechanical And Energy Technology, 2021, India,  Bildiriler Kitabı, Pp. 285-293.
  • Manchanda, S. (2021). Artificial Intelligence Driven Monitoring, Prediction And Recommendation System (AIM-PRISM). In Intelligent Sustainable Systems: Selected Papers Of Worlds4 2021, Dec 17, 2021, Bildiriler Kitabı,  Pp. 409-421. O’Leary, C., Toosi, F. G., & Lynch, C. (2023). A Review of AutoML Software Tools for Time Series Forecasting and Anomaly Detection. ICAART, (3): 421-433.
  • Onodueze, F., Josyula, D. (2020). Anomaly Detection On MIL-STD-1553 Dataset Using Machine Learning Algorithms. In 2020 IEEE 19th International Conference On Trust, Security And Privacy İn Computing And Communications, Dec 29 – Jan 1, 2021, Guangzhou, Çin, Bildiriler Kitabı, Pp. 592-598.
  • Ozer, G., Netti, A., Tafani, D., Schulz, M. (2020). Characterizing HPC Performance Variation With Monitoring And Unsupervised Learning. In High Performance Computing: ISC High Performance 2020 International Workshops, Jun 21–25, 2020, Frankfurt, Germany, Bildiriler Kitabı, Pp. 280-292.
  • Pintilie, I., Manolache, A., Brad, F. (2023). Time Series Anomaly Detection Using Diffusion-Based Models. In 2023 IEEE International Conference On Data Mining Workshops, Aralık 4, 2023, Şanghay, Çin, Bildiriler Kitabı, Pp. 570-578.
  • Princz, G., Shaloo, M., & Erol, S. (2024). Anomaly Detection in Binary Time Series Data: An unsupervised Machine Learning Approach for Condition Monitoring. Procedia Computer Science, 232: 1065-1078.
  • Qiu, J., Du, Q., Qian, C. (2019). Kpi-Tsad: A Time-Series Anomaly Detector For Kpi Monitoring İn Cloud Applications. Symmetry, 11(11): 1350.
  • Tang, Y., Wang, H., Zhan, X., Luo, X., Zhou, Y., Zhou, H., Keung, J. (2021). A Systematical Study On Application Performance Management Libraries For Apps. IEEE Transactions On Software Engineering, 48(8): 3044-3065.
  • Zhou, X., Peng, X., Xie, T., Sun, J., Ji, C., Liu, D., He, C. (2019). Latent Error Prediction And Fault Localization For Microservice Applications By Learning From System Trace Logs. In Proceedings Of The 2019 27th ACM Joint Meeting On European Software Engineering Conference And Symposium On The Foundations Of Software Engineering, Aug 26 - 30, 2019, Tallin, Estonya, Bildiriler Kitabı, Pp. 683-694.
There are 16 citations in total.

Details

Primary Language Turkish
Subjects Electrical Engineering (Other)
Journal Section Research Paper
Authors

Oğuzhan Demir 0009-0007-1985-2210

Ahmet Can Uğur 0009-0001-4911-5461

Mehmet Burak Deveci 0009-0004-6100-0511

Mehmet Fatih Akay 0000-0003-0780-0679

Ceren Ulus 0000-0003-2086-6381

Early Pub Date July 19, 2024
Publication Date
Submission Date July 12, 2024
Acceptance Date July 19, 2024
Published in Issue Year 2024 Volume: 6 Issue: 1

Cite

APA Demir, O., Uğur, A. C., Deveci, M. B., Akay, M. F., et al. (2024). Uygulama, altyapı ve ağ gözlenebilirlik platformunun geliştirilmesi. Uluslararası Mühendislik Tasarım Ve Teknoloji Dergisi, 6(1), 22-30.