TOPIC MODEL IMPLEMENTATION TO FIND RELATED DOCUMENTS IN CORPORATE ARCHIVES IN REAL LIFE: “A CASE SCENARIO ON KNOWLEDGE RETRIEVAL”

İhsan Tolga Medeni; Tunç Durmuş Medeni

TOPIC MODEL IMPLEMENTATION TO FIND RELATED DOCUMENTS IN CORPORATE ARCHIVES IN REAL LIFE: “A CASE SCENARIO ON KNOWLEDGE RETRIEVAL”

Year 2013, Volume: 5 Issue: 1, 98 - 107, 01.06.2013

Abstract

Today’s organizations were mostly built over their documents. These documents are very crucial sources of knowledge. Even they know the existence of these documents, most of the time, it is nearly impossible to extract captive knowledge inside. In these conditions, organizations choose re-prepare same document again rather than finding proper documents in the archives. On the other hand, finding these documents would save precious time and decrease redundancy of the work. Topic model idea basically focuses on extraction of knowledge from these types of documents. In this study, our aim is to give a summary of Topic Model research and try to explain latest model concept over an imaginary case scenario

Keywords

Topic Model , Knowledge Extraction , Latent Semantic Analysis (LSA) , Probabilistic Latent Semantic Analysis (pLSA) , Latent Dirichlet Allocation (LDA)

References

Blei, Ng, Jordan,(2003), “Latent Dirichlet Allocation”, Journal of. Machine. Learning. Vol..3, pp. 993–1022.
Davenport, Prusak, (2000), Working Knowledge:How Organizations Manage
What They Know, Boston, Harward Business School Press. Deerwester, Dumais, Furnas, Landauer, Harshman (1990), “Indexing by Latent
Semantic Analysis” Journal of the American Society for Information Science, , Vol.41,No.6,pp.391-407. Gethers, Poshyvanyk,(2010),“Using Relational Topic Models to Capture
Coupling among Classes in Object-Oriented Software Systems”, IEEE International Conference on Software Maintenance, 2010.
Girolami, Kabán, (2003), “On an equivalence between PLSI and LDA”, in: Proc.
Annu. ACM SIGIR Int. Conf. on Research and Development in Information Retrieval, Toronto, Ontario, Canada, , pp. 433–434. Griffiths, Steyvers, (2004) “Finding scientific topics”, Proc. Nat. Acad. Sci. Vol.101 No.1 , pp. 5228–5235.
Hofmann (1999), “Probabilistic latent semantic indexing”, in: Proc. 22nd Annu.
ACM SIGIR Int. Conf. on Research and Development in Information Retrieval, Berkeley, CA, USA, , pp. 50–57. Kakkonen, Myller, Sutinen, Timonen.(2008) “Comparison of Dimension
Reduction Methods for Automated Essay Grading”, Educational Technology & Society;Vol.11, No.3,pp.275-288. Linstead, Rigor, Bajracharya, Lopes, Baldi,(2007), “Mining concepts from code with probabilistic topic models”, in: Proc. 22nd IEEE/ACM Int. Conf. on Automated
Lukins, Kraft, Etzkorn.(2010),”Bug localization using latent Dirichlet allocation”.
Information and Software Technology Vol.52, No.9,pp.972-990. Poshyvanyk, Guéhéneuc, Marcus, G. Antoniol, Rajlich (2006), “Combining probabilistic ranking and latent semantic indexing for feature location”, in:Proc. th IEEE Int. Conf. on Program Comprehension, Athens, Greece, , pp. 137–148.
Steyvers, Griffiths, (2007), “Probabilistic topic models”, (in: Landauer,
McNamara, Dennis, Kintsch-Ed, Handbook of Latent Semantic Analysis, Lawrence Erlbaum Associates.. Tian, Revelle, Poshyvanyk.(2009), “Using Latent Dirichlet Allocation for
Automatic Categorization of Software”. 6th Ieee International Working Conference on Mining Software Repositories pp.163-166. Wei, Croft,(2006) “LDA-based document models for ad-hoc retrieval”, in: Proc. th Annu. Int. ACM SIGIR Conf. on Research & Development on Information
Retrieval, WA, USA , pp. 178–185. Zheng, McLean, Lu, (2006), “Identifying biological concepts from a protein- related corpus with a probabilistic topic model”. Bmc Bioinformatics Vol.7.
Park, Ramamohanarao,.(2009), “The Sensitivity of Latent Dirichlet Allocation for
Information Retrieval”. (In: Buntine, Grobelnik, Mladenić, Shawe-Taylor-Ed. ,Machine Learning and Knowledge Discovery in Databases): Springer Berlin Heidelberg, pp. 176-188.

Year 2013, Volume: 5 Issue: 1, 98 - 107, 01.06.2013

İhsan Tolga Medeni Tunç Durmuş Medeni

Abstract

References

Blei, Ng, Jordan,(2003), “Latent Dirichlet Allocation”, Journal of. Machine. Learning. Vol..3, pp. 993–1022.
Davenport, Prusak, (2000), Working Knowledge:How Organizations Manage
What They Know, Boston, Harward Business School Press. Deerwester, Dumais, Furnas, Landauer, Harshman (1990), “Indexing by Latent
Semantic Analysis” Journal of the American Society for Information Science, , Vol.41,No.6,pp.391-407. Gethers, Poshyvanyk,(2010),“Using Relational Topic Models to Capture
Coupling among Classes in Object-Oriented Software Systems”, IEEE International Conference on Software Maintenance, 2010.
Girolami, Kabán, (2003), “On an equivalence between PLSI and LDA”, in: Proc.
Annu. ACM SIGIR Int. Conf. on Research and Development in Information Retrieval, Toronto, Ontario, Canada, , pp. 433–434. Griffiths, Steyvers, (2004) “Finding scientific topics”, Proc. Nat. Acad. Sci. Vol.101 No.1 , pp. 5228–5235.
Hofmann (1999), “Probabilistic latent semantic indexing”, in: Proc. 22nd Annu.
ACM SIGIR Int. Conf. on Research and Development in Information Retrieval, Berkeley, CA, USA, , pp. 50–57. Kakkonen, Myller, Sutinen, Timonen.(2008) “Comparison of Dimension
Reduction Methods for Automated Essay Grading”, Educational Technology & Society;Vol.11, No.3,pp.275-288. Linstead, Rigor, Bajracharya, Lopes, Baldi,(2007), “Mining concepts from code with probabilistic topic models”, in: Proc. 22nd IEEE/ACM Int. Conf. on Automated
Lukins, Kraft, Etzkorn.(2010),”Bug localization using latent Dirichlet allocation”.
Information and Software Technology Vol.52, No.9,pp.972-990. Poshyvanyk, Guéhéneuc, Marcus, G. Antoniol, Rajlich (2006), “Combining probabilistic ranking and latent semantic indexing for feature location”, in:Proc. th IEEE Int. Conf. on Program Comprehension, Athens, Greece, , pp. 137–148.
Steyvers, Griffiths, (2007), “Probabilistic topic models”, (in: Landauer,
McNamara, Dennis, Kintsch-Ed, Handbook of Latent Semantic Analysis, Lawrence Erlbaum Associates.. Tian, Revelle, Poshyvanyk.(2009), “Using Latent Dirichlet Allocation for
Automatic Categorization of Software”. 6th Ieee International Working Conference on Mining Software Repositories pp.163-166. Wei, Croft,(2006) “LDA-based document models for ad-hoc retrieval”, in: Proc. th Annu. Int. ACM SIGIR Conf. on Research & Development on Information
Retrieval, WA, USA , pp. 178–185. Zheng, McLean, Lu, (2006), “Identifying biological concepts from a protein- related corpus with a probabilistic topic model”. Bmc Bioinformatics Vol.7.
Park, Ramamohanarao,.(2009), “The Sensitivity of Latent Dirichlet Allocation for
Information Retrieval”. (In: Buntine, Grobelnik, Mladenić, Shawe-Taylor-Ed. ,Machine Learning and Knowledge Discovery in Databases): Springer Berlin Heidelberg, pp. 176-188.

There are 18 citations in total.

Details

Other ID	JA87VM42CG
Journal Section	Articles
Authors	İhsan Tolga Medeni This is me Tunç Durmuş Medeni This is me
Publication Date	June 1, 2013
Submission Date	June 1, 2013
Published in Issue	Year 2013 Volume: 5 Issue: 1

Cite

APA	Medeni, İ. T., & Medeni, T. D. (2013). TOPIC MODEL IMPLEMENTATION TO FIND RELATED DOCUMENTS IN CORPORATE ARCHIVES IN REAL LIFE: “A CASE SCENARIO ON KNOWLEDGE RETRIEVAL”. International Journal of EBusiness and EGovernment Studies, 5(1), 98-107.
AMA	Medeni İT, Medeni TD. TOPIC MODEL IMPLEMENTATION TO FIND RELATED DOCUMENTS IN CORPORATE ARCHIVES IN REAL LIFE: “A CASE SCENARIO ON KNOWLEDGE RETRIEVAL.” IJEBEG. June 2013;5(1):98-107.
Chicago	Medeni, İhsan Tolga, and Tunç Durmuş Medeni. “TOPIC MODEL IMPLEMENTATION TO FIND RELATED DOCUMENTS IN CORPORATE ARCHIVES IN REAL LIFE: ‘A CASE SCENARIO ON KNOWLEDGE RETRIEVAL’”. International Journal of EBusiness and EGovernment Studies 5, no. 1 (June 2013): 98-107.
EndNote	Medeni İT, Medeni TD (June 1, 2013) TOPIC MODEL IMPLEMENTATION TO FIND RELATED DOCUMENTS IN CORPORATE ARCHIVES IN REAL LIFE: “A CASE SCENARIO ON KNOWLEDGE RETRIEVAL”. International Journal of eBusiness and eGovernment Studies 5 1 98–107.
IEEE	İ. T. Medeni and T. D. Medeni, “TOPIC MODEL IMPLEMENTATION TO FIND RELATED DOCUMENTS IN CORPORATE ARCHIVES IN REAL LIFE: ‘A CASE SCENARIO ON KNOWLEDGE RETRIEVAL’”, IJEBEG, vol. 5, no. 1, pp. 98–107, 2013.
ISNAD	Medeni, İhsan Tolga - Medeni, Tunç Durmuş. “TOPIC MODEL IMPLEMENTATION TO FIND RELATED DOCUMENTS IN CORPORATE ARCHIVES IN REAL LIFE: ‘A CASE SCENARIO ON KNOWLEDGE RETRIEVAL’”. International Journal of eBusiness and eGovernment Studies 5/1 (June2013), 98-107.
JAMA	Medeni İT, Medeni TD. TOPIC MODEL IMPLEMENTATION TO FIND RELATED DOCUMENTS IN CORPORATE ARCHIVES IN REAL LIFE: “A CASE SCENARIO ON KNOWLEDGE RETRIEVAL”. IJEBEG. 2013;5:98–107.
MLA	Medeni, İhsan Tolga and Tunç Durmuş Medeni. “TOPIC MODEL IMPLEMENTATION TO FIND RELATED DOCUMENTS IN CORPORATE ARCHIVES IN REAL LIFE: ‘A CASE SCENARIO ON KNOWLEDGE RETRIEVAL’”. International Journal of EBusiness and EGovernment Studies, vol. 5, no. 1, 2013, pp. 98-107.
Vancouver	Medeni İT, Medeni TD. TOPIC MODEL IMPLEMENTATION TO FIND RELATED DOCUMENTS IN CORPORATE ARCHIVES IN REAL LIFE: “A CASE SCENARIO ON KNOWLEDGE RETRIEVAL”. IJEBEG. 2013;5(1):98-107.

Article Files

Full Text