< Previous   List   Next >  
Financial Fraud Detection using Text Mining Analysis against Municipal Cybercriminality
Full-text Download
Choi Sukjae (Humanitas BigData Research Center, Kyung Hee University)
Lee Jungwon (School of Management, Kyung Hee University)
Kwon Ohbyung (School of Management, Kyung Hee University)
Vol. 23, No. 3, Page: 119 ~ 138
SVM, Financial Fraud Detection, Cybercrime, Crisis Management, Text Mining
Recently, SNS has become an important channel for marketing as well as personal communication. However, cybercrime has also evolved with the development of information and communication technology, and illegal advertising is distributed to SNS in large quantity. As a result, personal information is lost and even monetary damages occur more frequently. In this study, we propose a method to analyze which sentences and documents, which have been sent to the SNS, are related to financial fraud.
First of all, as a conceptual framework, we developed a matrix of conceptual characteristics of cybercriminality on SNS and emergency management. We also suggested emergency management process which consists of Pre-Cybercriminality (e.g. risk identification) and Post-Cybercriminality steps. Among those we focused on risk identification in this paper.
The main process consists of data collection, preprocessing and analysis. First, we selected two words ’daechul(loan)’ and ‘sachae(private loan)’ as seed words and collected data with this word from SNS such as twitter. The collected data are given to the two researchers to decide whether they are related to the cybercriminality, particularly financial fraud, or not. Then we selected some of them as keywords if the vocabularies are related to the nominals and symbols. With the selected keywords, we searched and collected data from web materials such as twitter, news, blog, and more than 820,000 articles collected.
The collected articles were refined through preprocessing and made into learning data. The preprocessing process is divided into performing morphological analysis step, removing stop words step, and selecting valid part-of-speech step. In the morphological analysis step, a complex sentence is transformed into some morpheme units to enable mechanical analysis. In the removing stop words step, non-lexical elements such as numbers, punctuation marks, and double spaces are removed from the text.
In the step of selecting valid part-of-speech, only two kinds of nouns and symbols are considered. Since nouns could refer to things, the intent of message is expressed better than the other part-of-speech.Moreover, the more illegal the text is, the more frequently symbols are used.
The selected data is given ‘legal’ or ‘illegal’. To make the selected data as learning data through the preprocessing process, it is necessary to classify whether each data is legitimate or not. The processed data is then converted into Corpus type and Document-Term Matrix. Finally, the two types of ‘legal’ and ‘illegal’ files were mixed and randomly divided into learning data set and test data set. In this study, we set the learning data as 70% and the test data as 30%.
SVM was used as the discrimination algorithm. Since SVM requires gamma and cost values as the main parameters, we set gamma as 0.5 and cost as 10, based on the optimal value function. The cost is set higher than general cases. To show the feasibility of the idea proposed in this paper, we compared the proposed method with MLE (Maximum Likelihood Estimation), Term Frequency, and Collective Intelligence method. Overall accuracy and was used as the metric. As a result, the overall accuracy of the proposed method was 92.41% of illegal loan advertisement and 77.75% of illegal visit sales, which is apparently superior to that of the Term Frequency, MLE, etc. Hence, the result suggests that the proposed method is valid and usable practically.
In this paper, we propose a framework for crisis management caused by abnormalities of unstructured data sources such as SNS. We hope this study will contribute to the academia by identifying what to consider when applying the SVM-like discrimination algorithm to text analysis. Moreover, the study will also contribute to the practitioners in the field of brand management and opinion mining.
Show/Hide Detailed Information in Korean
지자체 사이버 공간 안전을 위한금융사기 탐지 텍스트 마이닝 방법
최석재 (경희대학교 빅데이터 연구센터)
이중원 (경희대학교 경영학과)
권오병 (경희대학교 경영학과)
SVM, 금융사기, 사이버 범죄, 위기관리, 텍스트마이닝
최근 SNS는 개인의 의사소통뿐 아니라 마케팅의 중요한 채널로도 자리매김하고 있다. 그러나 사이버 범죄역시 정보와 통신 기술의 발달에 따라 진화하여 불법 광고가 SNS에 다량으로 배포되고 있다. 그 결과 개인정보를 빼앗기거나 금전적인 손해가 빈번하게 일어난다. 본 연구에서는 SNS로 전달되는 홍보글인 비정형 데이터를분석하여 어떤 글이 금융사기(예: 불법 대부업 및 불법 방문판매)와 관련된 글인지를 분석하는 방법론을 제안하였다. 불법 홍보글 학습 데이터를 만드는 과정과, 데이터의 특성을 고려하여 입력 데이터를 구성하는 방안, 그리고 판별 알고리즘의 선택과 추출할 정보 대상의 선정 등이 프레임워크의 주요 구성 요소이다. 본 연구의 방법은 실제로 모 지방자치단체의 금융사기 방지 프로그램의 파일럿 테스트에 활용되었으며, 실제 데이터를 가지고분석한 결과 금융사기 글을 판정하는 정확도가 사람들에 의하여 판정하는 것이나 키워드 추출법(Term Frequency), MLE 등에 비하여 월등함을 검증하였다.
Cite this article
JIIS Style
Sukjae, C., L. Jungwon, and K. Ohbyung, "Financial Fraud Detection using Text Mining Analysis against Municipal Cybercriminality", Journal of Intelligence and Information Systems, Vol. 23, No. 3 (2017), 119~138.

IEEE Style
Choi Sukjae, Lee Jungwon, and Kwon Ohbyung, "Financial Fraud Detection using Text Mining Analysis against Municipal Cybercriminality", Journal of Intelligence and Information Systems, vol. 23, no. 3, pp. 119~138, 2017.

ACM Style
Sukjae, C., Jungwon, L., and Ohbyung, K., 2017. Financial Fraud Detection using Text Mining Analysis against Municipal Cybercriminality. Journal of Intelligence and Information Systems. 23, 3, 119--138.
Export Formats : BiBTeX, EndNote
Advanced Search
Date Range

author = {Sukjae, Choi and Jungwon, Lee and Ohbyung, Kwon},
title = {Financial Fraud Detection using Text Mining Analysis against Municipal Cybercriminality},
journal = {Journal of Intelligence and Information Systems},
issue_date = {September 2017},
volume = {23},
number = {3},
month = Sep,
year = {2017},
issn = {2288-4866},
pages = {119--138},
url = { },
doi = {10.13088/jiis.2017.23.3.119},
publisher = {Korea Intelligent Information System Society},
address = {Seoul, Republic of Korea},
keywords = { SVM, Financial Fraud Detection, Cybercrime, Crisis Management and Text Mining },
%0 Journal Article
%1 700
%A Choi Sukjae
%A Lee Jungwon
%A Kwon Ohbyung
%T Financial Fraud Detection using Text Mining Analysis against Municipal Cybercriminality
%J Journal of Intelligence and Information Systems
%@ 2288-4866
%V 23
%N 3
%P 119-138
%D 2017
%R 10.13088/jiis.2017.23.3.119
%I Korea Intelligent Information System Society