The main purpose of this study is to predict wine quality based on
physicochemical data. In this study, two large separate data sets which were
taken from UC Irvine Machine Learning Repository were used. These data sets
contain 1599 instances for red wine and 4898 instances for white wine with 11
features of physicochemical data such as alcohol, chlorides, density, total
sulfur dioxide, free sulfur dioxide, residual sugar, and pH. First, the
instances were successfully classified as red wine and white wine with the
accuracy of 99.5229% by using Random Forests Algorithm. Then, the following
three different data mining algorithms were used to classify the quality of
both red wine and white wine: k-nearest-neighbourhood, random forests and
support vector machines. There are 6 quality classes of red wine and 7 quality
classes of white wine. The most successful classification was obtained by using
Random Forests Algorithm. In this study, it is also observed that the use of
principal component analysis in the feature selection increases the success
rate of classification in Random Forests Algorithm.
Subjects | Engineering |
---|---|
Journal Section | Research Article |
Authors | |
Publication Date | December 26, 2016 |
Published in Issue | Year 2016 Volume: 4 Issue: Special Issue-1 |