BibTex RIS Cite

A New Framework To Extract Knowledge By Text Mining Tools

Year 2010, Volume: 5 Issue: 2, 165 - 177, 01.12.2010

Abstract

Nowadays, enterprises are invaded from a large amount of unstructured information in textual documents, web-pages, e-mails, chats, forums, blogs. In recent years the number of documents available in electronic form, has grown almost exponentially. Therefore it's important use a technological platform to extract and manage useful knowledge for business goals. The goal of the knowledge management is to provide, in all corporate levels, in the right format, the right information at the right time. The aim of this paper is the presentation of a new framework to manage unstructured knowledge by text mining technology. With text mining tools, we can obtain high performances and discover interesting hidden relationships among business data. Text mining technology uses semantic engine and artificial intelligence algorithms to mine, extract and classify the knowledge. The knowledge extracted is useful for Business Intelligence tools used from top manager in the strategic planning.

References

  • Abulaish M., Jahiruddin S. and Dey L. (2009) “A Relation Mining and Visualization Framework for Automated Text Summarization”. In Proceedings of the 3rd international Conference on Pattern Recognition and Machine intelligence (New Delhi, India, December 16 - 20, 2009). S. Chaudhury, S. Mitra, C. A. Murthy, P. S. Sastry, and S. K. Pal, Eds. Lecture Notes In Computer Science, vol. 5909. Springer- Verlag, Berlin, Heidelberg, pp. 249-254
  • AITech-Assinform (2007) “Assinform report, ICT and multimedial contents”, Milano, Italy
  • Baars H. and Kemper H. (2008) “Management Support with Structured and Unstructured Data-An Integrated Business Intelligence Framework” Inf. Sys. Manag. 25, 2 (Mar. 2008), pp. 132-148
  • Berry M. W. and Castellanos M., editors (2007) “Survey of Text Mining II: Clustering, Classification, and Retrieval”, Springer
  • Blumberg R. and Atre S. (2003) “The problem with unstructured data”, DM Rev. February 2003
  • Butt J., Rutstein C., Gilett F. and Khawaja S. (2001) "Turning Data Into Dollars", Forrester Research, May 2001
  • Canuto A. M., Campos A. M., Bezerra V. M. and Abreu M. C. (2007) “Investigating the use of a multi-agent system for knowledge discovery in databases”, Int. J. Hybrid Intell. Syst. 4, 1 (Jan. 2007), pp. 27-38
  • Chowdhary P., Mihaila G. and Lei H. (2006) “Model Driven Data Warehousing for Business Performance Management”, In Proceedings of the IEEE international Conference on E-Business Engineering (October 24 - 26, 2006). ICEBE. IEEE Computer Society, Washington, DC, pp. 483-487
  • Consoli D. (2010) "The multidimensional model of knowledge management in the competitive enterprise”. In Proceeding of 12th IS MM&T 2010, Sunny Beach, Bourgas, Bulgaria, Journal of International Scientific Pubblication: Materials, Method & Technologies, ,Vol. 4, p. 2, 2010, pp. 5-29.
  • De Rosnay, J. (2002) “Les risques de l’infopollution”, Transversales, Science Culture, Nouvelle série n°1, Mai, 2002
  • Gantz J. and Reinsel D. (2009) “As the economy contracts, the digital universe expands”, IDC Multimedia white paper, ECM, may 2009
  • Gelernter J. and Lesk M. (2009) “Text mining for indexing”, In Proceedings of the 9th ACM/IEEE-CS Joint Conference on Digital Libraries (Austin, TX, USA, June 15 - 19, 2009). JCDL '09. ACM, New York, NY, pp. 467-468
  • Hruschka E. R., Campello R. J., Freitas A. A. and De Carvalho A. C. (2009) “A survey of evolutionary algorithms for clustering”. Trans. Sys. Man Cyber Part C 39, 2 (Mar. 2009), pp. 133-155
  • Kurgan, L. A. and Musilek, P. (2006) “A survey of Knowledge Discovery and Data Mining process models”. Knowl. Eng. Rev. 21, 1 (Mar. 2006), pp. 1-24
  • Kurland, O. and Lee L. (2009) “Clusters, language models, and ad hoc information retrieval”, ACM Trans. Inf. Syst. 27, 3 (May. 2009), pp. 1-39
  • Li Y. and Zhang H. (2007) “Two properties of SVD and its application in data hiding”, In Proceedings of the intelligent Computing 3rd international Conference on Advanced intelligent Computing theories and Applications (Qingdao, China, August 21 - 24, 2007). D. Huang, L. Heutte, and M. Loog, Eds. Lecture Notes In Computer Science. Springer-Verlag, Berlin, Heidelberg, pp. 679-689.
  • Li Y., Chung S. M., and Holt J. D. (2008) “ Text document clustering based on frequent word meaning sequences.”, Data Knowl. Eng. 64, 1 (Jan. 2008), pp. 381-404
  • Lyman P., Varian H.R., Charles P., Good N., Jordan L.L. and Pal J. (2003) “How much information?”, http://www2.sims.berkeley.edu/research/projects/how-much-info-2003
  • Malik R., Franke L. and Siebes A. (2006) “Combination of text-mining algorithms increases the performance”, Bioinformatics 22, 17 (Aug. 2006), pp. 2151-2157
  • Mastrogiannis N., Boutsinas B. and Giannikos I. (2009) “A method for improving the accuracy of data mining classification algorithms”, Comput. Oper. Res. 36, 10 (Oct. 2009), pp. 2829-2839
  • Nielsen Company (2010) “Understanding the Value of a Social Media Impression: A Nielsen and Facebook Joint Study”, New York, US, 2010
  • Nielsen J. (2003) “IM, Not IP (Information Pollution)”, Queue 1, 8 (Nov. 2003), pp. 76-75
  • Nonaka I. and Takeuci H., (1995) “The Knowledge Creating Company: How Japanese Companies Create the Dynamics of Innovation”, Oxford University Press, New York, 1995
  • Orman L. (1984) “Fighting Information Pollution with Decision Support Systems”, Journal of Management Information Systems, 1(2), pp. 64-71
  • Rizzotto F. (2006) “White paper: Qualità e valore nella gestione dell'informazione nonstrutturata: gli strumenti basati sull'analisi semantica”, IDC company, 2006
  • S. Bolasco, A. Canzonetti, F. M. Capo, F. della Ratta-Rinaldi and B. K. Singh (2005) “Understanding text mining: A pragmatic approach” in Knowledge Mining, ser. Studies in Fuzziness and Soft Computing, S. Sirmakessis ed., Springer Verlag, 2005, vol. 185, pp. 31–50.
  • S. Qu, S. Wang, and Y. Zou (2008) “Improvement of text feature selection method based on tfidf ”, in FITME '08: Proceedings of the 2008 International Seminar on Future Information Technology and Management Engineering. Washington, DC, USA: IEEE Computer Society, 2008, pp. 79-81.
  • Suh J. H., Park C. H. and Jeon S. H. (2010) “Applying text and data mining techniques to forecasting the trend of petitions filed to e-People”, Expert Syst. Appl. 37, 10 (Oct. 2010), pp. 7255-7268
  • Tanawongsuwan P. (2010) “Part-of-Speech Approach to Evaluation of Textbook Reviews”, in Proceedings of the 2010 Second international Conference on Computer and Network Technology (April 23 - 25, 2010). ICCNT. IEEE Computer Society, Washington, DC, pp. 352-356
  • Teradata (2006) “Insights from the Fifth Annual Teradata Survey Validate a Global Phenomenon”, Enterprise Decision-Making survey, 2006 Report, Teradata
  • Toffler A. (1990) “Powershift: Knowledge, Wealth and Violence at the Edge of the 21st Century”, Bantam Books, 1990
  • Tseng S. (2008) “Knowledge management system performance measure index”, Expert Syst. Appl. 34, 1 (Jan. 2008), pp. 734-745
  • Wilks Y. and Brewster C. (2009) “Natural Language Processing as a Foundation of the Semantic Web”, Found. Trends Web Sci. 1, 3$#8211;4 (Mar. 2009), pp. 199-327
  • Zhai C. (2008) “Statistical Language Models for Information Retrieval A Critical Review”, Found. Trends Inf. Retr. 2, 3 (Mar. 2008), pp. 137-213
  • Zhuge H. and Sun Y. (2010) “The schema theory for semantic link network”, Future Gener. Comput. Syst. 26, 3 (Mar. 2010), pp. 408-420

A New Framework To Extract Knowledge By Text Mining Tools

Year 2010, Volume: 5 Issue: 2, 165 - 177, 01.12.2010

Abstract

Nowadays, enterprises are invaded from a large amount of unstructured information in textual documents, web-pages, e-mails, chats, forums, blogs. In recent years the number of documents available in electronic form, has grown almost exponentially. Therefore it's important use a technological platform to extract and manage useful knowledge for business goals. The goal of the knowledge management is to provide, in all corporate levels, in the right format, the right information at the right time. The aim of this paper is the presentation of a new framework to manage unstructured knowledge by text mining technology. With text mining tools, we can obtain high performances and discover interesting hidden relationships among business data. Text mining technology uses semantic engine and artificial intelligence algorithms to mine, extract and classify the knowledge. The knowledge extracted is useful for Business Intelligence tools used from top manager in the strategic planning.

References

  • Abulaish M., Jahiruddin S. and Dey L. (2009) “A Relation Mining and Visualization Framework for Automated Text Summarization”. In Proceedings of the 3rd international Conference on Pattern Recognition and Machine intelligence (New Delhi, India, December 16 - 20, 2009). S. Chaudhury, S. Mitra, C. A. Murthy, P. S. Sastry, and S. K. Pal, Eds. Lecture Notes In Computer Science, vol. 5909. Springer- Verlag, Berlin, Heidelberg, pp. 249-254
  • AITech-Assinform (2007) “Assinform report, ICT and multimedial contents”, Milano, Italy
  • Baars H. and Kemper H. (2008) “Management Support with Structured and Unstructured Data-An Integrated Business Intelligence Framework” Inf. Sys. Manag. 25, 2 (Mar. 2008), pp. 132-148
  • Berry M. W. and Castellanos M., editors (2007) “Survey of Text Mining II: Clustering, Classification, and Retrieval”, Springer
  • Blumberg R. and Atre S. (2003) “The problem with unstructured data”, DM Rev. February 2003
  • Butt J., Rutstein C., Gilett F. and Khawaja S. (2001) "Turning Data Into Dollars", Forrester Research, May 2001
  • Canuto A. M., Campos A. M., Bezerra V. M. and Abreu M. C. (2007) “Investigating the use of a multi-agent system for knowledge discovery in databases”, Int. J. Hybrid Intell. Syst. 4, 1 (Jan. 2007), pp. 27-38
  • Chowdhary P., Mihaila G. and Lei H. (2006) “Model Driven Data Warehousing for Business Performance Management”, In Proceedings of the IEEE international Conference on E-Business Engineering (October 24 - 26, 2006). ICEBE. IEEE Computer Society, Washington, DC, pp. 483-487
  • Consoli D. (2010) "The multidimensional model of knowledge management in the competitive enterprise”. In Proceeding of 12th IS MM&T 2010, Sunny Beach, Bourgas, Bulgaria, Journal of International Scientific Pubblication: Materials, Method & Technologies, ,Vol. 4, p. 2, 2010, pp. 5-29.
  • De Rosnay, J. (2002) “Les risques de l’infopollution”, Transversales, Science Culture, Nouvelle série n°1, Mai, 2002
  • Gantz J. and Reinsel D. (2009) “As the economy contracts, the digital universe expands”, IDC Multimedia white paper, ECM, may 2009
  • Gelernter J. and Lesk M. (2009) “Text mining for indexing”, In Proceedings of the 9th ACM/IEEE-CS Joint Conference on Digital Libraries (Austin, TX, USA, June 15 - 19, 2009). JCDL '09. ACM, New York, NY, pp. 467-468
  • Hruschka E. R., Campello R. J., Freitas A. A. and De Carvalho A. C. (2009) “A survey of evolutionary algorithms for clustering”. Trans. Sys. Man Cyber Part C 39, 2 (Mar. 2009), pp. 133-155
  • Kurgan, L. A. and Musilek, P. (2006) “A survey of Knowledge Discovery and Data Mining process models”. Knowl. Eng. Rev. 21, 1 (Mar. 2006), pp. 1-24
  • Kurland, O. and Lee L. (2009) “Clusters, language models, and ad hoc information retrieval”, ACM Trans. Inf. Syst. 27, 3 (May. 2009), pp. 1-39
  • Li Y. and Zhang H. (2007) “Two properties of SVD and its application in data hiding”, In Proceedings of the intelligent Computing 3rd international Conference on Advanced intelligent Computing theories and Applications (Qingdao, China, August 21 - 24, 2007). D. Huang, L. Heutte, and M. Loog, Eds. Lecture Notes In Computer Science. Springer-Verlag, Berlin, Heidelberg, pp. 679-689.
  • Li Y., Chung S. M., and Holt J. D. (2008) “ Text document clustering based on frequent word meaning sequences.”, Data Knowl. Eng. 64, 1 (Jan. 2008), pp. 381-404
  • Lyman P., Varian H.R., Charles P., Good N., Jordan L.L. and Pal J. (2003) “How much information?”, http://www2.sims.berkeley.edu/research/projects/how-much-info-2003
  • Malik R., Franke L. and Siebes A. (2006) “Combination of text-mining algorithms increases the performance”, Bioinformatics 22, 17 (Aug. 2006), pp. 2151-2157
  • Mastrogiannis N., Boutsinas B. and Giannikos I. (2009) “A method for improving the accuracy of data mining classification algorithms”, Comput. Oper. Res. 36, 10 (Oct. 2009), pp. 2829-2839
  • Nielsen Company (2010) “Understanding the Value of a Social Media Impression: A Nielsen and Facebook Joint Study”, New York, US, 2010
  • Nielsen J. (2003) “IM, Not IP (Information Pollution)”, Queue 1, 8 (Nov. 2003), pp. 76-75
  • Nonaka I. and Takeuci H., (1995) “The Knowledge Creating Company: How Japanese Companies Create the Dynamics of Innovation”, Oxford University Press, New York, 1995
  • Orman L. (1984) “Fighting Information Pollution with Decision Support Systems”, Journal of Management Information Systems, 1(2), pp. 64-71
  • Rizzotto F. (2006) “White paper: Qualità e valore nella gestione dell'informazione nonstrutturata: gli strumenti basati sull'analisi semantica”, IDC company, 2006
  • S. Bolasco, A. Canzonetti, F. M. Capo, F. della Ratta-Rinaldi and B. K. Singh (2005) “Understanding text mining: A pragmatic approach” in Knowledge Mining, ser. Studies in Fuzziness and Soft Computing, S. Sirmakessis ed., Springer Verlag, 2005, vol. 185, pp. 31–50.
  • S. Qu, S. Wang, and Y. Zou (2008) “Improvement of text feature selection method based on tfidf ”, in FITME '08: Proceedings of the 2008 International Seminar on Future Information Technology and Management Engineering. Washington, DC, USA: IEEE Computer Society, 2008, pp. 79-81.
  • Suh J. H., Park C. H. and Jeon S. H. (2010) “Applying text and data mining techniques to forecasting the trend of petitions filed to e-People”, Expert Syst. Appl. 37, 10 (Oct. 2010), pp. 7255-7268
  • Tanawongsuwan P. (2010) “Part-of-Speech Approach to Evaluation of Textbook Reviews”, in Proceedings of the 2010 Second international Conference on Computer and Network Technology (April 23 - 25, 2010). ICCNT. IEEE Computer Society, Washington, DC, pp. 352-356
  • Teradata (2006) “Insights from the Fifth Annual Teradata Survey Validate a Global Phenomenon”, Enterprise Decision-Making survey, 2006 Report, Teradata
  • Toffler A. (1990) “Powershift: Knowledge, Wealth and Violence at the Edge of the 21st Century”, Bantam Books, 1990
  • Tseng S. (2008) “Knowledge management system performance measure index”, Expert Syst. Appl. 34, 1 (Jan. 2008), pp. 734-745
  • Wilks Y. and Brewster C. (2009) “Natural Language Processing as a Foundation of the Semantic Web”, Found. Trends Web Sci. 1, 3$#8211;4 (Mar. 2009), pp. 199-327
  • Zhai C. (2008) “Statistical Language Models for Information Retrieval A Critical Review”, Found. Trends Inf. Retr. 2, 3 (Mar. 2008), pp. 137-213
  • Zhuge H. and Sun Y. (2010) “The schema theory for semantic link network”, Future Gener. Comput. Syst. 26, 3 (Mar. 2010), pp. 408-420
There are 35 citations in total.

Details

Primary Language Turkish
Journal Section Articles
Authors

Domenico Consolı This is me

Publication Date December 1, 2010
Published in Issue Year 2010 Volume: 5 Issue: 2

Cite

APA Consolı, D. (2010). A New Framework To Extract Knowledge By Text Mining Tools. Bilgi Ekonomisi Ve Yönetimi Dergisi, 5(2), 165-177.