DIGITAL LIBRARY ARCHIVE
HOME > DIGITAL LIBRARY ARCHIVE
< Previous   List   Next >  
The Effect of Data Size on the k-NN Predictability: Application to Samsung Electronics Stock Market Prediction
Full-text Download
Se-Hak Chun (Department of Business Administration, Seoul National University of Science and Technology)
Vol. 25, No. 3, Page: 239 ~ 251
Keywords
k-nearest neighbor, case based reasoning, stock market prediction, python
Abstract
Statistical methods such as moving averages, Kalman filtering, exponential smoothing, regression analysis, and ARIMA (autoregressive integrated moving average) have been used for stock market predictions. However, these statistical methods have not produced superior performances. In recent years, machine learning techniques have been widely used in stock market predictions, including artificial neural network, SVM, and genetic algorithm. In particular, a case-based reasoning method, known as k-nearest neighbor is also widely used for stock price prediction. Case based reasoning retrieves several similar cases from previous cases when a new problem occurs, and combines the class labels of similar cases to create a classification for the new problem. However, case based reasoning has some problems. First, case based reasoning has a tendency to search for a fixed number of neighbors in the observation space and always selects the same number of neighbors rather than the best similar neighbors for the target case. So, case based reasoning may have to take into account more cases even when there are fewer cases applicable depending on the subject. Second, case based reasoning may select neighbors that are far away from the target case. Thus, case based reasoning does not guarantee an optimal pseudo-neighborhood for various target cases, and the predictability can be degraded due to a deviation from the desired similar neighbor.
This paper examines how the size of learning data affects stock price predictability through k-nearest neighbor and compares the predictability of k-nearest neighbor with the random walk model according to the size of the learning data and the number of neighbors. In this study, Samsung electronics stock prices were predicted by dividing the learning dataset into two types. For the prediction of next day's closing price, we used four variables: opening value, daily high, daily low, and daily close. In the first experiment, data from January 1, 2000 to December 31, 2017 were used for the learning process. In the second experiment, data from January 1, 2015 to December 31, 2017 were used for the learning process. The test data is from January 1, 2018 to August 31, 2018 for both experiments.
We compared the performance of k-NN with the random walk model using the two learning dataset.
The mean absolute percentage error (MAPE) was 1.3497 for the random walk model and 1.3570 for the k-NN for the first experiment when the learning data was small. However, the mean absolute percentage error (MAPE) for the random walk model was 1.3497 and the k-NN was 1.2928 for the second experiment when the learning data was large. These results show that the prediction power when more learning data are used is higher than when less learning data are used. Also, this paper shows that k-NN generally produces a better predictive power than random walk model for larger learning datasets and does not when the learning dataset is relatively small.
Future studies need to consider macroeconomic variables related to stock price forecasting including opening price, low price, high price, and closing price. Also, to produce better results, it is recommended that the k-nearest neighbor needs to find nearest neighbors using the second step filtering method considering fundamental economic variables as well as a sufficient amount of learning data.
Show/Hide Detailed Information in Korean
데이터 크기에 따른 k-NN의 예측력 연구: 삼성전자주가를 사례로
천세학 (서울과학기술대학교 경영학과)
Keywords
k-최근접 이웃 알고리즘, 사례기반추론, 주식시장 예측, 파이썬
Abstract
본 논문은 학습데이터의 크기에 따른 사례기반추론기법이 주가예측력에 어떻게 영향을 미치는지 살펴본다.
삼성전자 주가를 대상을 학습데이터를 2000년부터 2017년까지 이용한 경우와 2015년부터 2017년까지 이용한경우를 비교하였다. 테스트데이터는 두 경우 모두 2018년 1월 1일부터 2018년 8월 31일까지 이용하였다. 시계열데이터의 경우 과거데이터가 얼마나 유용한지 살펴보는 측면과 유사사례개수의 중요성을 살펴보는 측면에서연구를 진행하였다. 실험결과 학습데이터가 많은 경우가 그렇지 않은 경우보다 예측력이 높았다. MAPE을 기준으로 비교할 때, 학습데이터가 적은 경우, 유사사례 개수와 상관없이 k-NN이 랜덤워크모델에 비해 좋은 결과를보여주지 못했다. 그러나 학습데이터가 많은 경우, 일반적으로 k-NN의 예측력이 랜덤워크모델에 비해 좋은 결과를 보여주었다. k-NN을 비롯한 다른 데이터마이닝 방법론들이 주가 예측력 제고를 위해 학습데이터의 크기를 증가시키는 것 이외에, 거시경제변수를 고려한 기간유사사례를 찾아 적용하는 것을 제안한다.
Cite this article
JIIS Style
Chun, S.-H., , "The Effect of Data Size on the k-NN Predictability: Application to Samsung Electronics Stock Market Prediction", Journal of Intelligence and Information Systems, Vol. 25, No. 3 (2019), 239~251.

IEEE Style
Se-Hak Chun, "The Effect of Data Size on the k-NN Predictability: Application to Samsung Electronics Stock Market Prediction", Journal of Intelligence and Information Systems, vol. 25, no. 3, pp. 239~251, 2019.

ACM Style
Chun, S.-H.,, 2019. The Effect of Data Size on the k-NN Predictability: Application to Samsung Electronics Stock Market Prediction. Journal of Intelligence and Information Systems. 25, 3, 239--251.
Export Formats : BiBTeX, EndNote

Warning: include(/home/hosting_users/ev_jiisonline/www/admin/archive/advancedSearch.php) [function.include]: failed to open stream: No such file or directory in /home/hosting_users/ev_jiisonline/www/archive/detail.php on line 429

Warning: include() [function.include]: Failed opening '/home/hosting_users/ev_jiisonline/www/admin/archive/advancedSearch.php' for inclusion (include_path='.:/usr/local/php/lib/php') in /home/hosting_users/ev_jiisonline/www/archive/detail.php on line 429
@article{Chun:JIIS:2019:790,
author = {Chun, Se-Hak},
title = {The Effect of Data Size on the k-NN Predictability: Application to Samsung Electronics Stock Market Prediction},
journal = {Journal of Intelligence and Information Systems},
issue_date = {September 2019},
volume = {25},
number = {3},
month = Sep,
year = {2019},
issn = {2288-4866},
pages = {239--251},
url = {},
doi = {},
publisher = {Korea Intelligent Information System Society},
address = {Seoul, Republic of Korea},
keywords = { k-nearest neighbor, case based reasoning, stock market prediction and python
},
}
%0 Journal Article
%1 790
%A Se-Hak Chun
%T The Effect of Data Size on the k-NN Predictability: Application to Samsung Electronics Stock Market Prediction
%J Journal of Intelligence and Information Systems
%@ 2288-4866
%V 25
%N 3
%P 239-251
%D 2019
%R
%I Korea Intelligent Information System Society