Research Article
BibTex RIS Cite

USING TEXTUAL FEATURES FOR THE DETECTION OF VANDALISM IN WIKIPEDIA: A COMPARATIVE APPROACH IN LOW-RESOURCE LANGUAGE SECTIONS

Year 2017, Volume: 5 Issue: 1, 80 - 87, 30.06.2017
https://doi.org/10.17261/Pressacademia.2017.575

Abstract

This study
investigates the impact of using textual features for the detection of
vandalism across low-resource language sections in Wikipedia. For this purpose,
we propose new features that allow the machine learning-based text classifiers
to better distinguish vandalism and to improve the detection rates of vandalism
across languages, based on textual features applied in previous researches.
These features enable us to compare the contributions of the bots against
vandalism, stressing the differences between bots and editors with regards to
the detection of vandalism. We propose a new set of efficient and language
independent features, which has the performance level similar to the previous
sets. Three Wikipedia sections will be used for this purpose: Simple English
(simple), Albanian (sq) and Bosnian (bs). We will show that our set of textual
features has similar and, in some cases, better vandalism detection rates
across languages than previous research. 



 

References

  • Adler B. T, de Alfaro L., Pye I., 2008, “Measuring author contributions to the Wikipedia. In: WikiSym ’08, Porto, Portugal, 8-10 September 2008. New York: ACM.
  • Adler B. T., de Alfaro L., Mola-Velasco S. M., Rosso P., and West A. G., 2011, “Wikipedia vandalism detection: Combining natural language, metadata, and reputation features”. In Proceedings of the 12th International Conference on Computational Linguistics and Intelligent Text Processing - Volume Part II, CICLing'11, pages 277 - 288, Berlin, Heidelberg, Springer-Verlag.
  • Davis J. and Goadrich M., 2006, “The Relationship Between Precision-Recall and ROCCurves”. In Proceedings of the 23rd International Conference on Machine learning (ICML), 2006.
  • Geiger R. S. and Ribes D., 2010, “The Work of Sustaining Order in Wikipedia: The Banning of a Vandal”. In Proceedings of the 22nd ACM Conference on Computer Supported Cooperative Work (CSCW).
  • Hunt J. W., Mcllroy M. D, 1974, “An Algorithm for Differential File Comparison”, Computer Science Technical Report, Bell Laboratories.
  • Massey F. J., 1951, “The Kolmogorov-Smirnov Test for Goodness of Fit”. Journal of the American Statistical Association, 46.
  • Mola-Velasco S. M., 2010, “Wikipedia Vandalism Detection Through Machine Learning: Feature Review and New Proposals”. In CLEF (Notebook Papers/Labs/-Workshops).
  • Susuri A., Hamiti M. and Dika A, 2016, “Machine Learning Based Detection of Vandalism in Wikipedia across Languages”. In proceedings of the 5th Mediterranean Conference on Embedded Computing (MECO), Bar, Montenegro.
  • Tran K.N., Christen P., 2013 "Cross-language prediction of vandalism on wikipedia using article views and revisions". Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD).
  • West A. G., 2013, “Damage Detection and Mitigation in Open Collaboration Applications”, Ph.D. thesis, University of Pennsylvania.
  • West A. G. and Lee I., 2011, “Multilingual Vandalism Detection using Language-Independent & Ex Post Facto Evidence”. In CLEF (Notebook Papers/Labs/Workshops).
Year 2017, Volume: 5 Issue: 1, 80 - 87, 30.06.2017
https://doi.org/10.17261/Pressacademia.2017.575

Abstract

References

  • Adler B. T, de Alfaro L., Pye I., 2008, “Measuring author contributions to the Wikipedia. In: WikiSym ’08, Porto, Portugal, 8-10 September 2008. New York: ACM.
  • Adler B. T., de Alfaro L., Mola-Velasco S. M., Rosso P., and West A. G., 2011, “Wikipedia vandalism detection: Combining natural language, metadata, and reputation features”. In Proceedings of the 12th International Conference on Computational Linguistics and Intelligent Text Processing - Volume Part II, CICLing'11, pages 277 - 288, Berlin, Heidelberg, Springer-Verlag.
  • Davis J. and Goadrich M., 2006, “The Relationship Between Precision-Recall and ROCCurves”. In Proceedings of the 23rd International Conference on Machine learning (ICML), 2006.
  • Geiger R. S. and Ribes D., 2010, “The Work of Sustaining Order in Wikipedia: The Banning of a Vandal”. In Proceedings of the 22nd ACM Conference on Computer Supported Cooperative Work (CSCW).
  • Hunt J. W., Mcllroy M. D, 1974, “An Algorithm for Differential File Comparison”, Computer Science Technical Report, Bell Laboratories.
  • Massey F. J., 1951, “The Kolmogorov-Smirnov Test for Goodness of Fit”. Journal of the American Statistical Association, 46.
  • Mola-Velasco S. M., 2010, “Wikipedia Vandalism Detection Through Machine Learning: Feature Review and New Proposals”. In CLEF (Notebook Papers/Labs/-Workshops).
  • Susuri A., Hamiti M. and Dika A, 2016, “Machine Learning Based Detection of Vandalism in Wikipedia across Languages”. In proceedings of the 5th Mediterranean Conference on Embedded Computing (MECO), Bar, Montenegro.
  • Tran K.N., Christen P., 2013 "Cross-language prediction of vandalism on wikipedia using article views and revisions". Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD).
  • West A. G., 2013, “Damage Detection and Mitigation in Open Collaboration Applications”, Ph.D. thesis, University of Pennsylvania.
  • West A. G. and Lee I., 2011, “Multilingual Vandalism Detection using Language-Independent & Ex Post Facto Evidence”. In CLEF (Notebook Papers/Labs/Workshops).
There are 11 citations in total.

Details

Journal Section Articles
Authors

Arsim Susuri This is me

Mentor Hamiti This is me

Agni Dika This is me

Publication Date June 30, 2017
Published in Issue Year 2017 Volume: 5 Issue: 1

Cite

APA Susuri, A., Hamiti, M., & Dika, A. (2017). USING TEXTUAL FEATURES FOR THE DETECTION OF VANDALISM IN WIKIPEDIA: A COMPARATIVE APPROACH IN LOW-RESOURCE LANGUAGE SECTIONS. PressAcademia Procedia, 5(1), 80-87. https://doi.org/10.17261/Pressacademia.2017.575
AMA Susuri A, Hamiti M, Dika A. USING TEXTUAL FEATURES FOR THE DETECTION OF VANDALISM IN WIKIPEDIA: A COMPARATIVE APPROACH IN LOW-RESOURCE LANGUAGE SECTIONS. PAP. June 2017;5(1):80-87. doi:10.17261/Pressacademia.2017.575
Chicago Susuri, Arsim, Mentor Hamiti, and Agni Dika. “USING TEXTUAL FEATURES FOR THE DETECTION OF VANDALISM IN WIKIPEDIA: A COMPARATIVE APPROACH IN LOW-RESOURCE LANGUAGE SECTIONS”. PressAcademia Procedia 5, no. 1 (June 2017): 80-87. https://doi.org/10.17261/Pressacademia.2017.575.
EndNote Susuri A, Hamiti M, Dika A (June 1, 2017) USING TEXTUAL FEATURES FOR THE DETECTION OF VANDALISM IN WIKIPEDIA: A COMPARATIVE APPROACH IN LOW-RESOURCE LANGUAGE SECTIONS. PressAcademia Procedia 5 1 80–87.
IEEE A. Susuri, M. Hamiti, and A. Dika, “USING TEXTUAL FEATURES FOR THE DETECTION OF VANDALISM IN WIKIPEDIA: A COMPARATIVE APPROACH IN LOW-RESOURCE LANGUAGE SECTIONS”, PAP, vol. 5, no. 1, pp. 80–87, 2017, doi: 10.17261/Pressacademia.2017.575.
ISNAD Susuri, Arsim et al. “USING TEXTUAL FEATURES FOR THE DETECTION OF VANDALISM IN WIKIPEDIA: A COMPARATIVE APPROACH IN LOW-RESOURCE LANGUAGE SECTIONS”. PressAcademia Procedia 5/1 (June 2017), 80-87. https://doi.org/10.17261/Pressacademia.2017.575.
JAMA Susuri A, Hamiti M, Dika A. USING TEXTUAL FEATURES FOR THE DETECTION OF VANDALISM IN WIKIPEDIA: A COMPARATIVE APPROACH IN LOW-RESOURCE LANGUAGE SECTIONS. PAP. 2017;5:80–87.
MLA Susuri, Arsim et al. “USING TEXTUAL FEATURES FOR THE DETECTION OF VANDALISM IN WIKIPEDIA: A COMPARATIVE APPROACH IN LOW-RESOURCE LANGUAGE SECTIONS”. PressAcademia Procedia, vol. 5, no. 1, 2017, pp. 80-87, doi:10.17261/Pressacademia.2017.575.
Vancouver Susuri A, Hamiti M, Dika A. USING TEXTUAL FEATURES FOR THE DETECTION OF VANDALISM IN WIKIPEDIA: A COMPARATIVE APPROACH IN LOW-RESOURCE LANGUAGE SECTIONS. PAP. 2017;5(1):80-7.

PressAcademia Procedia (PAP) publishes proceedings of conferences, seminars and symposiums. PressAcademia Procedia aims to provide a source for academic researchers, practitioners and policy makers in the area of social and behavioral sciences, and engineering.

PressAcademia Procedia invites academic conferences for publishing their proceedings with a review of editorial board. Since PressAcademia Procedia is an double blind peer-reviewed open-access book, the manuscripts presented in the conferences can easily be reached by numerous researchers. Hence, PressAcademia Procedia increases the value of your conference for your participants. 

PressAcademia Procedia provides an ISBN for each Conference Proceeding Book and a DOI number for each manuscript published in this book.

PressAcademia Procedia is currently indexed by DRJI, J-Gate, International Scientific Indexing, ISRA, Root Indexing, SOBIAD, Scope, EuroPub, Journal Factor Indexing and InfoBase Indexing. 

Please contact to procedia@pressacademia.org for your conference proceedings.