BibTex RIS Cite

TOPIC MODEL IMPLEMENTATION TO FIND RELATED DOCUMENTS IN CORPORATE ARCHIVES IN REAL LIFE: “A CASE SCENARIO ON KNOWLEDGE RETRIEVAL”

Year 2013, Volume: 5 Issue: 1, 98 - 107, 01.06.2013

Abstract

Today’s organizations were mostly built over their documents. These documents are very crucial sources of knowledge. Even they know the existence of these documents, most of the time, it is nearly impossible to extract captive knowledge inside. In these conditions, organizations choose re-prepare same document again rather than finding proper documents in the archives. On the other hand, finding these documents would save precious time and decrease redundancy of the work. Topic model idea basically focuses on extraction of knowledge from these types of documents. In this study, our aim is to give a summary of Topic Model research and try to explain latest model concept over an imaginary case scenario

References

  • Blei, Ng, Jordan,(2003), “Latent Dirichlet Allocation”, Journal of. Machine. Learning. Vol..3, pp. 993–1022.
  • Davenport, Prusak, (2000), Working Knowledge:How Organizations Manage
  • What They Know, Boston, Harward Business School Press. Deerwester, Dumais, Furnas, Landauer, Harshman (1990), “Indexing by Latent
  • Semantic Analysis” Journal of the American Society for Information Science, , Vol.41,No.6,pp.391-407. Gethers, Poshyvanyk,(2010),“Using Relational Topic Models to Capture
  • Coupling among Classes in Object-Oriented Software Systems”, IEEE International Conference on Software Maintenance, 2010.
  • Girolami, Kabán, (2003), “On an equivalence between PLSI and LDA”, in: Proc.
  • Annu. ACM SIGIR Int. Conf. on Research and Development in Information Retrieval, Toronto, Ontario, Canada, , pp. 433–434. Griffiths, Steyvers, (2004) “Finding scientific topics”, Proc. Nat. Acad. Sci. Vol.101 No.1 , pp. 5228–5235.
  • Hofmann (1999), “Probabilistic latent semantic indexing”, in: Proc. 22nd Annu.
  • ACM SIGIR Int. Conf. on Research and Development in Information Retrieval, Berkeley, CA, USA, , pp. 50–57. Kakkonen, Myller, Sutinen, Timonen.(2008) “Comparison of Dimension
  • Reduction Methods for Automated Essay Grading”, Educational Technology & Society;Vol.11, No.3,pp.275-288. Linstead, Rigor, Bajracharya, Lopes, Baldi,(2007), “Mining concepts from code with probabilistic topic models”, in: Proc. 22nd IEEE/ACM Int. Conf. on Automated
  • Lukins, Kraft, Etzkorn.(2010),”Bug localization using latent Dirichlet allocation”.
  • Information and Software Technology Vol.52, No.9,pp.972-990. Poshyvanyk, Guéhéneuc, Marcus, G. Antoniol, Rajlich (2006), “Combining probabilistic ranking and latent semantic indexing for feature location”, in:Proc. th IEEE Int. Conf. on Program Comprehension, Athens, Greece, , pp. 137–148.
  • Steyvers, Griffiths, (2007), “Probabilistic topic models”, (in: Landauer,
  • McNamara, Dennis, Kintsch-Ed, Handbook of Latent Semantic Analysis, Lawrence Erlbaum Associates.. Tian, Revelle, Poshyvanyk.(2009), “Using Latent Dirichlet Allocation for
  • Automatic Categorization of Software”. 6th Ieee International Working Conference on Mining Software Repositories pp.163-166. Wei, Croft,(2006) “LDA-based document models for ad-hoc retrieval”, in: Proc. th Annu. Int. ACM SIGIR Conf. on Research & Development on Information
  • Retrieval, WA, USA , pp. 178–185. Zheng, McLean, Lu, (2006), “Identifying biological concepts from a protein- related corpus with a probabilistic topic model”. Bmc Bioinformatics Vol.7.
  • Park, Ramamohanarao,.(2009), “The Sensitivity of Latent Dirichlet Allocation for
  • Information Retrieval”. (In: Buntine, Grobelnik, Mladenić, Shawe-Taylor-Ed. ,Machine Learning and Knowledge Discovery in Databases): Springer Berlin Heidelberg, pp. 176-188.
Year 2013, Volume: 5 Issue: 1, 98 - 107, 01.06.2013

Abstract

References

  • Blei, Ng, Jordan,(2003), “Latent Dirichlet Allocation”, Journal of. Machine. Learning. Vol..3, pp. 993–1022.
  • Davenport, Prusak, (2000), Working Knowledge:How Organizations Manage
  • What They Know, Boston, Harward Business School Press. Deerwester, Dumais, Furnas, Landauer, Harshman (1990), “Indexing by Latent
  • Semantic Analysis” Journal of the American Society for Information Science, , Vol.41,No.6,pp.391-407. Gethers, Poshyvanyk,(2010),“Using Relational Topic Models to Capture
  • Coupling among Classes in Object-Oriented Software Systems”, IEEE International Conference on Software Maintenance, 2010.
  • Girolami, Kabán, (2003), “On an equivalence between PLSI and LDA”, in: Proc.
  • Annu. ACM SIGIR Int. Conf. on Research and Development in Information Retrieval, Toronto, Ontario, Canada, , pp. 433–434. Griffiths, Steyvers, (2004) “Finding scientific topics”, Proc. Nat. Acad. Sci. Vol.101 No.1 , pp. 5228–5235.
  • Hofmann (1999), “Probabilistic latent semantic indexing”, in: Proc. 22nd Annu.
  • ACM SIGIR Int. Conf. on Research and Development in Information Retrieval, Berkeley, CA, USA, , pp. 50–57. Kakkonen, Myller, Sutinen, Timonen.(2008) “Comparison of Dimension
  • Reduction Methods for Automated Essay Grading”, Educational Technology & Society;Vol.11, No.3,pp.275-288. Linstead, Rigor, Bajracharya, Lopes, Baldi,(2007), “Mining concepts from code with probabilistic topic models”, in: Proc. 22nd IEEE/ACM Int. Conf. on Automated
  • Lukins, Kraft, Etzkorn.(2010),”Bug localization using latent Dirichlet allocation”.
  • Information and Software Technology Vol.52, No.9,pp.972-990. Poshyvanyk, Guéhéneuc, Marcus, G. Antoniol, Rajlich (2006), “Combining probabilistic ranking and latent semantic indexing for feature location”, in:Proc. th IEEE Int. Conf. on Program Comprehension, Athens, Greece, , pp. 137–148.
  • Steyvers, Griffiths, (2007), “Probabilistic topic models”, (in: Landauer,
  • McNamara, Dennis, Kintsch-Ed, Handbook of Latent Semantic Analysis, Lawrence Erlbaum Associates.. Tian, Revelle, Poshyvanyk.(2009), “Using Latent Dirichlet Allocation for
  • Automatic Categorization of Software”. 6th Ieee International Working Conference on Mining Software Repositories pp.163-166. Wei, Croft,(2006) “LDA-based document models for ad-hoc retrieval”, in: Proc. th Annu. Int. ACM SIGIR Conf. on Research & Development on Information
  • Retrieval, WA, USA , pp. 178–185. Zheng, McLean, Lu, (2006), “Identifying biological concepts from a protein- related corpus with a probabilistic topic model”. Bmc Bioinformatics Vol.7.
  • Park, Ramamohanarao,.(2009), “The Sensitivity of Latent Dirichlet Allocation for
  • Information Retrieval”. (In: Buntine, Grobelnik, Mladenić, Shawe-Taylor-Ed. ,Machine Learning and Knowledge Discovery in Databases): Springer Berlin Heidelberg, pp. 176-188.
There are 18 citations in total.

Details

Other ID JA87VM42CG
Journal Section Articles
Authors

İhsan Tolga Medeni This is me

Tunç Durmuş Medeni This is me

Publication Date June 1, 2013
Submission Date June 1, 2013
Published in Issue Year 2013 Volume: 5 Issue: 1

Cite

APA Medeni, İ. T., & Medeni, T. D. (2013). TOPIC MODEL IMPLEMENTATION TO FIND RELATED DOCUMENTS IN CORPORATE ARCHIVES IN REAL LIFE: “A CASE SCENARIO ON KNOWLEDGE RETRIEVAL”. International Journal of EBusiness and EGovernment Studies, 5(1), 98-107.
AMA Medeni İT, Medeni TD.TOPIC MODEL IMPLEMENTATION TO FIND RELATED DOCUMENTS IN CORPORATE ARCHIVES IN REAL LIFE: “A CASE SCENARIO ON KNOWLEDGE RETRIEVAL.” IJEBEG. June 2013;5(1):98-107.
Chicago Medeni, İhsan Tolga, and Tunç Durmuş Medeni. “TOPIC MODEL IMPLEMENTATION TO FIND RELATED DOCUMENTS IN CORPORATE ARCHIVES IN REAL LIFE: ‘A CASE SCENARIO ON KNOWLEDGE RETRIEVAL’”. International Journal of EBusiness and EGovernment Studies 5, no. 1 (June 2013): 98-107.
EndNote Medeni İT, Medeni TD (June 1, 2013) TOPIC MODEL IMPLEMENTATION TO FIND RELATED DOCUMENTS IN CORPORATE ARCHIVES IN REAL LIFE: “A CASE SCENARIO ON KNOWLEDGE RETRIEVAL”. International Journal of eBusiness and eGovernment Studies 5 1 98–107.
IEEE İ. T. Medeni and T. D. Medeni, “TOPIC MODEL IMPLEMENTATION TO FIND RELATED DOCUMENTS IN CORPORATE ARCHIVES IN REAL LIFE: ‘A CASE SCENARIO ON KNOWLEDGE RETRIEVAL’”, IJEBEG, vol. 5, no. 1, pp. 98–107, 2013.
ISNAD Medeni, İhsan Tolga - Medeni, Tunç Durmuş. “TOPIC MODEL IMPLEMENTATION TO FIND RELATED DOCUMENTS IN CORPORATE ARCHIVES IN REAL LIFE: ‘A CASE SCENARIO ON KNOWLEDGE RETRIEVAL’”. International Journal of eBusiness and eGovernment Studies 5/1 (June 2013), 98-107.
JAMA Medeni İT, Medeni TD. TOPIC MODEL IMPLEMENTATION TO FIND RELATED DOCUMENTS IN CORPORATE ARCHIVES IN REAL LIFE: “A CASE SCENARIO ON KNOWLEDGE RETRIEVAL”. IJEBEG. 2013;5:98–107.
MLA Medeni, İhsan Tolga and Tunç Durmuş Medeni. “TOPIC MODEL IMPLEMENTATION TO FIND RELATED DOCUMENTS IN CORPORATE ARCHIVES IN REAL LIFE: ‘A CASE SCENARIO ON KNOWLEDGE RETRIEVAL’”. International Journal of EBusiness and EGovernment Studies, vol. 5, no. 1, 2013, pp. 98-107.
Vancouver Medeni İT, Medeni TD. TOPIC MODEL IMPLEMENTATION TO FIND RELATED DOCUMENTS IN CORPORATE ARCHIVES IN REAL LIFE: “A CASE SCENARIO ON KNOWLEDGE RETRIEVAL”. IJEBEG. 2013;5(1):98-107.