DIGITAL LIBRARY ARCHIVE
HOME > DIGITAL LIBRARY ARCHIVE
< Previous   List   Next >  
Stock Price Prediction by Utilizing Category Neutral Terms: Text Mining Approach
Full-text Download
Minsik Lee (Department of Information and Industrial Engineering, Yonsei University)
Hong Joo Lee (Department of Business Administration, Catholic University of Korea)
Vol. 23, No. 2, Page: 123 ~ 138
http://dx.doi.org/10.13088/jiis.2017.23.2.123
Keywords
Stock Price, Neutral Terms, Text Mining, Online News
Abstract
Since the stock market is driven by the expectation of traders, studies have been conducted to predict stock price movements through analysis of various sources of text data. In order to predict stock price movements, research has been conducted not only on the relationship between text data and fluctuations in stock prices, but also on the trading stocks based on news articles and social media responses. Studies that predict the movements of stock prices have also applied classification algorithms with constructing term-document matrix in the same way as other text mining approaches.
Because the document contains a lot of words, it is better to select words that contribute more for building a term-document matrix. Based on the frequency of words, words that show too little frequency or importance are removed. It also selects words according to their contribution by measuring the degree to which a word contributes to correctly classifying a document.
The basic idea of constructing a term-document matrix was to collect all the documents to be analyzed and to select and use the words that have an influence on the classification. In this study, we analyze the documents for each individual item and select the words that are irrelevant for all categories as neutral words. We extract the words around the selected neutral word and use it to generate the term-document matrix. The neutral word itself starts with the idea that the stock movement is less related to the existence of the neutral words, and that the surrounding words of the neutral word are more likely to affect the stock price movements. And apply it to the algorithm that classifies the stock price fluctuations with the generated term-document matrix.
In this study, we firstly removed stop words and selected neutral words for each stock. And we used a method to exclude words that are included in news articles for other stocks among the selected words. Through the online news portal, we collected four months of news articles on the top 10 market cap stocks. We split the news articles into 3 month news data as training data and apply the remaining one month news articles to the model to predict the stock price movements of the next day. We used SVM, Boosting and Random Forest for building models and predicting the movements of stock prices. The stock market opened for four months (2016/02/01 ~ 2016/05/31) for a total of 80 days, using the initial 60 days as a training set and the remaining 20 days as a test set. The proposed word - based algorithm in this study showed better classification performance than the word selection method based on sparsity.
This study predicted stock price volatility by collecting and analyzing news articles of the top 10 stocks in market cap. We used the term - document matrix based classification model to estimate the stock price fluctuations and compared the performance of the existing sparse - based word extraction method and the suggested method of removing words from the term - document matrix. The suggested method differs from the word extraction method in that it uses not only the news articles for the corresponding stock but also other news items to determine the words to extract. In other words, it removed not only the words that appeared in all the increase and decrease but also the words that appeared common in the news for other stocks. When the prediction accuracy was compared, the suggested method showed higher accuracy.
The limitation of this study is that the stock price prediction was set up to classify the rise and fall, and the experiment was conducted only for the top ten stocks. The 10 stocks used in the experiment do not represent the entire stock market. In addition, it is difficult to show the investment performance because stock price fluctuation and profit rate may be different. Therefore, it is necessary to study the research using more stocks and the yield prediction through trading simulation.
Show/Hide Detailed Information in Korean
카테고리 중립 단어 활용을 통한 주가 예측 방안: 텍스트 마이닝 활용
이민식 (연세대학교 정보산업공학과)
이홍주 (이홍주 가톨릭대학교 경영학부)
Keywords
주가 예측, 중립 단어, 텍스트 마이닝, 온라인 뉴스
Abstract
주식 시장은 거래자들의 기업과 시황에 대한 기대가 반영되어 움직이기에, 다양한 원천의 텍스트 데이터 분석을 통해 주가 움직임을 예측하려는 연구들이 진행되어 왔다. 주가의 움직임을 예측하는 것이기에 단순히 주가의 등락 뿐만이 아니라, 뉴스 기사나 소셜 미디어의 반응에 따라 거래를 하고 이에 따른 수익률을 분석하는연구들이 진행되어 왔다. 주가의 움직임을 예측하는 연구들도 다른 분야의 텍스트 마이닝 접근 방안과 동일하게 단어-문서 매트릭스를 구성하여 분류 알고리즘에 적용하여 왔다.
문서에 많은 단어들이 포함되어 있기 때문에 모든 단어를 가지고 단어-문서 매트릭스를 만드는 것보다는 단어가 문서를 범주로 분류할 때 기여도가 높은 단어들을 선정하여야 한다. 단어의 빈도를 고려하여 너무 적은등장 빈도나 중요도를 보이는 단어는 제거하게 된다. 단어가 문서를 정확하게 분류하는 데 기여하는 정도를 측정하여 기여도에 따라 사용할 단어를 선정하기도 한다.
단어-문서 매트릭스를 구성하는 기본적인 방안인 분석의 대상이 되는 모든 문서를 수집하여 분류에 영향력을 미치는 단어를 선정하여 사용하는 것이었다. 본 연구에서는 개별 종목에 대한 문서를 분석하여 종목별 등락에 모두 포함되는 단어를 중립 단어로 선정한다. 선정된 중립 단어 주변에 등장하는 단어들을 추출하여 단어-문서 매트릭스 생성에 활용한다. 중립 단어 자체는 주가 움직임과 연관관계가 적고, 중립 단어의 주변 단어가 주가 상승에 더 영향을 미칠 것이라는 생각에서 출발한다. 생성된 단어-문서 매트릭스를 가지고 주가의 등락 여부를 분류하는 알고리즘에 적용하게 된다.
본 연구에서는 종목 별로 중립 단어를 1차 선정하고, 선정된 단어 중에서 다른 종목에도 많이 포함되는 단어는 추가적으로 제외하는 방안을 활용하였다. 온라인 뉴스 포털을 통해 시가 총액 상위 10개 종목에 대한 4개월간의 뉴스 기사를 수집하였다. 3개월간의 뉴스 기사를 학습 데이터로 분류 모형을 수립하였으며, 남은 1개월간의 뉴스 기사를 모형에 적용하여 다음 날의 주가 움직임을 예측하였다. 본 연구에서 제안하는 중립 단어 활용알고리즘이 희소성에 기반한 단어 선정 방안에 비해 우수한 분류 성과를 보였다.
Cite this article
JIIS Style
Lee, M. ., and H. J. . Lee, "Stock Price Prediction by Utilizing Category Neutral Terms: Text Mining Approach", Journal of Intelligence and Information Systems, Vol. 23, No. 2 (2017), 123~138.

IEEE Style
Minsik Lee, and Hong Joo Lee, "Stock Price Prediction by Utilizing Category Neutral Terms: Text Mining Approach", Journal of Intelligence and Information Systems, vol. 23, no. 2, pp. 123~138, 2017.

ACM Style
Lee, M. ., and Lee, H. J. ., 2017. Stock Price Prediction by Utilizing Category Neutral Terms: Text Mining Approach. Journal of Intelligence and Information Systems. 23, 2, 123--138.
Export Formats : BiBTeX, EndNote

Warning: include(/home/hosting_users/ev_jiisonline/www/admin/archive/advancedSearch.php) [function.include]: failed to open stream: No such file or directory in /home/hosting_users/ev_jiisonline/www/archive/detail.php on line 429

Warning: include() [function.include]: Failed opening '/home/hosting_users/ev_jiisonline/www/admin/archive/advancedSearch.php' for inclusion (include_path='.:/usr/local/php/lib/php') in /home/hosting_users/ev_jiisonline/www/archive/detail.php on line 429
@article{Lee:JIIS:2017:693,
author = {Lee, Minsik and Lee, Hong Joo },
title = {Stock Price Prediction by Utilizing Category Neutral Terms: Text Mining Approach},
journal = {Journal of Intelligence and Information Systems},
issue_date = {June 2017},
volume = {23},
number = {2},
month = Jun,
year = {2017},
issn = {2288-4866},
pages = {123--138},
url = {http://dx.doi.org/http://dx.doi.org/10.13088/jiis.2017.23.2.123 },
doi = {http://dx.doi.org/10.13088/jiis.2017.23.2.123},
publisher = {Korea Intelligent Information System Society},
address = {Seoul, Republic of Korea},
keywords = { Stock Price, Neutral Terms, Text Mining and Online News },
}
%0 Journal Article
%1 693
%A Minsik Lee
%A Hong Joo Lee
%T Stock Price Prediction by Utilizing Category Neutral Terms: Text Mining Approach
%J Journal of Intelligence and Information Systems
%@ 2288-4866
%V 23
%N 2
%P 123-138
%D 2017
%R http://dx.doi.org/10.13088/jiis.2017.23.2.123
%I Korea Intelligent Information System Society