Journal of Intelligence and Information Systems,
Vol. 18, No. 4, December 2012
Measuring the Economic Impact of Item Descriptions on Sales Performance
Dongwon Lee, Sung-Hyuk Park, and Songchun Moon
Vol. 18, No. 4, Page: 1 ~ 17
Keywords : App Description, Sales Performance Analysis, Text Mining
Personalized smart devices such as smartphones and smart pads are widely used. Unlike traditional feature phones, theses smart devices allow users to choose a variety of functions, which support not only daily experiences but also business operations. Actually, there exist a huge number of applications accessible by smart device users in online and mobile application markets. Users can choose apps that fit their own tastes and needs, which is impossible for conventional phone users. With the increase in app demand, the tastes and needs of app users are becoming more diverse. To meet these requirements, numerous apps with diverse functions are being released on the market, which leads to fierce competition. Unlike offline markets, online markets have a limitation in that purchasing decisions should be made without experiencing the items. Therefore, online customers rely more on item-related information that can be seen on the item page in which online markets commonly provide details about each item. Customers can feel confident about the quality of an item through the online information and decide whether to purchase it. The same is true of online app markets. To win the sales competition against other apps that perform similar functions, app developers need to focus on writing app descriptions to attract the attention of customers. If we can measure the effect of app descriptions on sales without regard to the app's price and quality, app descriptions that facilitate the sale of apps can be identified. This study intends to provide such a quantitative result for app developers who want to promote the sales of their apps. For this purpose, we collected app details including the descriptions written in Korean from one of the largest app markets in Korea, and then extracted keywords from the descriptions. Next, the impact of the keywords on sales performance was measured through our econometric model. Through this analysis, we were able to analyze the impact of each keyword itself, apart from that of the design or quality. The keywords, comprised of the attribute and evaluation of each app, are extracted by a morpheme analyzer. Our model with the keywords as its input variables was established to analyze their impact on sales performance. A regression analysis was conducted for each category in which apps are included. This analysis was required because we found the keywords, which are emphasized in app descriptions, different category-by-category. The analysis conducted not only for free apps but also for paid apps showed which keywords have more impact on sales performance for each type of app. In the analysis of paid apps in the education category, keywords such as 'search+easy' and 'words+abundant' showed higher effectiveness. In the same category, free apps whose keywords emphasize the quality of apps showed higher sales performance. One interesting fact is that keywords describing not only the app but also the need for the app have asignificant impact. Language learning apps, regardless of whether they are sold free or paid, showed higher sales performance by including the keywords 'foreign language study+important'. This result shows that motivation for the purchase affected sales. While item reviews are widely researched in online markets, item descriptions are not very actively studied. In the case of the mobile app markets, newly introduced apps may not have many item reviews because of the low quantity sold. In such cases, item descriptions can be regarded more important when customers make a decision about purchasing items. This study is the first trial to quantitatively analyze the relationship between an item description and its impact on sales performance. The results show that our research framework successfully provides a list of the most effective sales key terms with the estimates of their effectiveness. Although this study is performed for a specified type of item (i.e., mobile apps), our model can be applied to almost all of the items traded in online markets.
The Research on Recommender for New Customers Using Collaborative Filtering and Social Network Analysis
Chang-Hoon Shin, Ji-Won Lee, Han-Na Yang, and Il Young Choi
Vol. 18, No. 4, Page: 19 ~ 42
Keywords : Social Network Analysis, Recommender systems, Collaborative Filtering
Consumer consumption patterns are shifting rapidly as buyers migrate from offline markets to e-commerce routes, such as shopping channels on TV and internet shopping malls. In the offline markets consumers go shopping, see the shopping items, and choose from them. Recently consumers tend towards buying at shopping sites free from time and place. However, as e-commerce markets continue to expand, customers are complaining that it is becoming a bigger hassle to shop online. In the online shopping, shoppers have very limited information on the products. The delivered products can be different from what they have wanted. This case results to purchase cancellation. Because these things happen frequently, they are likely to refer to the consumer reviews and companies should be concerned about consumer's voice. E-commerce is a very important marketing tool for suppliers. It can recommend products to customers and connect them directly with suppliers with just a click of a button. The recommender system is being studied in various ways. Some of the more prominent ones include recommendation based on best-seller and demographics, contents filtering, and collaborative filtering. However, these systems all share two weaknesses : they cannot recommend products to consumers on a personal level, and they cannot recommend products to new consumers with no buying history. To fix these problems, we can use the information which has been collected from the questionnaires about their demographics and preference ratings. But, consumers feel these questionnaires are a burden and are unlikely to provide correct information. This study investigates combining collaborative filtering with the centrality of social network analysis. This centrality measure provides the information to infer the preference of new consumers from the shopping history of existing and previous ones. While the past researches had focused on the existing consumers with similar shopping patterns, this study tried to improve the accuracy of recommendation with all shopping information, which included not only similar shopping patterns but also dissimilar ones. Data used in this study, Movie Lens' data, was made by Group Lens research Project Team at University of Minnesota to recommend movies with a collaborative filtering technique. This data was built from the questionnaires of 943 respondents which gave the information on the preference ratings on 1,684 movies. Total data of 100,000 was organized by time, with initial data of 50,000 being existing customers and the latter 50,000 being new customers. The proposed recommender system consists of three systems : [+] group recommender system, [-] group recommender system, and integrated recommender system. [+] group recommender system looks at customers with similar buying patterns as 'neighbors', whereas [-] group recommender system looks at customers with opposite buying patterns as 'contraries'. Integrated recommender system uses both of the aforementioned recommender systems to recommend movies that both recommender systems pick. The study of three systems allows us to find the most suitable recommender system that will optimize accuracy and customer satisfaction. Our analysis showed that integrated recommender system is the best solution among the three systems studied, followed by [-] group recommended system and [+] group recommender system. This result conforms to the intuition that the accuracy of recommendation can be improved using all the relevant information. We provided contour maps and graphs to easily compare the accuracy of each recommender system. Although we saw improvement on accuracy with the integrated recommender system, we must remember that this research is based on static data with no live customers. In other words, consumers did not see the movies actually recommended from the system. Also, this recommendation system may not work well with products other than movies. Thus, it is important to note that recommendation systems need particular calibration for specific product/customer types.
A Hybrid Forecasting Framework based on Case-based Reasoning and Artificial Neural Network
Yousub Hwang
Vol. 18, No. 4, Page: 43 ~ 57
Keywords : Service Demand Forecasting, Self-Organizing Maps, Artificial Neural Network
To enhance the competitive advantage in a constantly changing business environment, an enterprise management must make the right decision in many business activities based on both internal and external information. Thus, providing accurate information plays a prominent role in management's decision making. Intuitively, historical data can provide a feasible estimate through the forecasting models. Therefore, if the service department can estimate the service quantity for the next period, the service department can then effectively control the inventory of service related resources such as human, parts, and other facilities. In addition, the production department can make load map for improving its product quality. Therefore, obtaining an accurate service forecast most likely appears to be critical to manufacturing companies. Numerous investigations addressing this problem have generally employed statistical methods, such as regression or autoregressive and moving average simulation. However, these methods are only efficient for data with are seasonal or cyclical. If the data are influenced by the special characteristics of product, they are not feasible. In our research, we propose a forecasting framework that predicts service demand of manufacturing organization by combining Case-based reasoning (CBR) and leveraging an unsupervised artificial neural network based clustering analysis (i.e., Self-Organizing Maps; SOM). We believe that this is one of the first attempts at applying unsupervised artificial neural network-based machine-learning techniques in the service forecasting domain. Our proposed approach has several appealing features : (1) We applied CBR and SOM in a new forecasting domain such as service demand forecasting. (2) We proposed our combined approach between CBR and SOM in order to overcome limitations of traditional statistical forecasting methods and We have developed a service forecasting tool based on the proposed approach using an unsupervised artificial neural network and Case-based reasoning. In this research, we conducted an empirical study on a real digital TV manufacturer (i.e., Company A). In addition, we have empirically evaluated the proposed approach and tool using real sales and service related data from digital TV manufacturer. In our empirical experiments, we intend to explore the performance of our proposed service forecasting framework when compared to the performances predicted by other two service forecasting methods; one is traditional CBR based forecasting model and the other is the existing service forecasting model used by Company A. We ran each service forecasting 144 times; each time, input data were randomly sampled for each service forecasting framework. To evaluate accuracy of forecasting results, we used Mean Absolute Percentage Error (MAPE) as primary performance measure in our experiments. We conducted one-way ANOVA test with the 144 measurements of MAPE for three different service forecasting approaches. For example, the F-ratio of MAPE for three different service forecasting approaches is 67.25 and the p-value is 0.000. This means that the difference between the MAPE of the three different service forecasting approaches is significant at the level of 0.000. Since there is a significant difference among the different service forecasting approaches, we conducted Tukey's HSD post hoc test to determine exactly which means of MAPE are significantly different from which other ones. In terms of MAPE, Tukey's HSD post hoc test grouped the three different service forecasting approaches into three different subsets in the following order: our proposed approach > traditional CBR-based service forecasting approach > the existing forecasting approach used by Company A. Consequently, our empirical experiments show that our proposed approach outperformed the traditional CBR based forecasting model and the existing service forecasting model used by Company A. The rest of this paper is organized as follows. Section 2 provides some research background information such as summary of CBR and SOM. Section 3 presents a hybrid service forecasting framework based on Case-based Reasoning and Self-Organizing Maps, while the empirical evaluation results are summarized in Section 4. Conclusion and future research directions are finally discussed in Section 5.
Development of Predictive Models for Rights Issues Using Financial Analysis Indices and Decision Tree Technique
Myeong-Kyun Kim, and Yoonho Cho
Vol. 18, No. 4, Page: 59 ~ 77
Keywords : Rights Issue, Financial Analysis Index, Decision Tree, Predictive Model, Predictive Model
This study focuses on predicting which firms will increase capital by issuing new stocks in the near future. Many stakeholders, including banks, credit rating agencies and investors, performs a variety of analyses for firms' growth, profitability, stability, activity, productivity, etc., and regularly report the firms' financial analysis indices. In the paper, we develop predictive models for rights issues using these financial analysis indices and data mining techniques. This study approaches to building the predictive models from the perspective of two different analyses. The first is the analysis period. We divide the analysis period into before and after the IMF financial crisis, and examine whether there is the difference between the two periods. The second is the prediction time. In order to predict when firms increase capital by issuing new stocks, the prediction time is categorized as one year, two years and three years later. Therefore Total six prediction models are developed and analyzed. In this paper, we employ the decision tree technique to build the prediction models for rights issues. The decision tree is the most widely used prediction method which builds decision trees to label or categorize cases into a set of known classes. In contrast to neural networks, logistic regression and SVM, decision tree techniques are well suited for high-dimensional applications and have strong explanation capabilities. There are well-known decision tree induction algorithms such as CHAID, CART, QUEST, C5.0, etc. Among them, we use C5.0 algorithm which is the most recently developed algorithm and yields performance better than other algorithms. We obtained data for the rights issue and financial analysis from TS2000 of Korea Listed Companies Association. A record of financial analysis data is consisted of 89 variables which include 9 growth indices, 30 profitability indices, 23 stability indices, 6 activity indices and 8 productivity indices. For the model building and test, we used 10,925 financial analysis data of total 658 listed firms. PASW Modeler 13 was used to build C5.0 decision trees for the six prediction models. Total 84 variables among financial analysis data are selected as the input variables of each model, and the rights issue status (issued or not issued) is defined as the output variable. To develop prediction models using C5.0 node (Node Options: Output type = Rule set, Use boosting = false, Cross-validate = false, Mode = Simple, Favor = Generality), we used 60% of data for model building and 40% of data for model test. The results of experimental analysis show that the prediction accuracies of data after the IMF financial crisis (59.04% to 60.43%) are about 10 percent higher than ones before IMF financial crisis (68.78% to 71.41%). These results indicate that since the IMF financial crisis, the reliability of financial analysis indices has increased and the firm intention of rights issue has been more obvious. The experiment results also show that the stability-related indices have a major impact on conducting rights issue in the case of short-term prediction. On the other hand, the long-term prediction of conducting rights issue is affected by financial analysis indices on profitability, stability, activity and productivity. All the prediction models include the industry code as one of significant variables. This means that companies in different types of industries show their different types of patterns for rights issue. We conclude that it is desirable for stakeholders to take into account stability-related indices and more various financial analysis indices for short-term prediction and long-term prediction, respectively. The current study has several limitations. First, we need to compare the differences in accuracy by using different data mining techniques such as neural networks, logistic regression and SVM. Second, we are required to develop and to evaluate new prediction models including variables which research in the theory of capital structure has mentioned about the relevance to rights issue.
Detecting Credit Loan Fraud Based on Individual-Level Utility
Keunho Choi, Gunwoo Kim, and Yongmoo Suh
Vol. 18, No. 4, Page: 79 ~ 95
Keywords : Utility-Sensitive Classification, Credit Loan Fraud, Fraud Detection
As credit loan products significantly increase in most financial institutions, the number of fraudulent transactions is also growing rapidly. Therefore, to manage the financial risks successfully, the financial institutions should reinforce the qualifications for a loan and augment the ability to detect a credit loan fraud proactively. In the process of building a classification model to detect credit loan frauds, utility from classification results (i.e., benefits from correct prediction and costs from incorrect prediction) is more important than the accuracy rate of classification. The objective of this paper is to propose a new approach to building a classification model for detecting credit loan fraud based on an individual-level utility. Experimental results show that the model comes up with higher utility than the fraud detection models which do not take into account the individual-level utility concept. Also, it is shown that the individual-level utility computed by the model is more accurate than the mean-level utility computed by other models, in both opportunity utility and cash flow perspectives. We provide diverse views on the experimental results from both perspectives.
Dispute of Part-Whole Representation in Conceptual Modeling
Taekyung Kim, Jinsoo Park, and Sangkyu Rho
Vol. 18, No. 4, Page: 97 ~ 116
Keywords : Conceptual Modeling, Part-Whole Relations, Bunge-Wand-Weber Ontology
Conceptual modeling is an important step for successful system development. It helps system designers and business practitioners share the same view on domain knowledge. If the work is successful, a result of conceptual modeling can be beneficial in increasing productivity and reducing failures. However, the value of conceptual modeling is unlikely to be evaluated uniformly because we are lack of agreement on how to elicit concepts and how to represent those with conceptual modeling constructs. Especially, designing relationships between components, also known as part-whole relationships, have been regarded as complicated work. The recent study, "Representing Part-Whole Relations in Conceptual Modeling : An Empirical Evaluation" (Shanks et al., 2008), published in MIS Quarterly, can be regarded as one of positive efforts. Not only the study is one of few attempts of trying to clarify how to select modeling alternatives in part-whole design, but also it shows results based on an empirical experiment. Shanks et al. argue that there are two modeling alternatives to represent part-whole relationships : an implicit representation and an explicit one. By conducting an experiment, they insist that the explicit representation increases the value of a conceptual model. Moreover, Shanks et al. justify their findings by citing the BWW ontology. Recently, the study from Shanks et al. faces criticism. Allen and March (2012) argue that Shanks et al.'s experiment is lack of validity and reliability since the experimental setting suffers from error-prone and self-defensive design. They point out that the experiment is intentionally fabricated to support the idea, as such that using concrete UML concepts results in positive results in understanding models. Additionally, Allen and March add that the experiment failed to consider boundary conditions; thus reducing credibility. Shanks and Weber (2012) contradict flatly the argument suggested by Allen and March (2012). To defend, they posit the BWW ontology is righteously applied in supporting the research. Moreover, the experiment, they insist, can be fairly acceptable. Therefore, Shanks and Weber argue that Allen and March distort the true value of Shanks et al. by pointing out minor limitations. In this study, we try to investigate the dispute around Shanks et al. in order to answer to the following question : "What is the proper value of the study conducted by Shanks et al.?" More profoundly, we question whether or not using the BWW ontology can be the only viable option of exploring better conceptual modeling methods and procedures. To understand key issues around the dispute, first we reviewed previous studies relating to the BWW ontology. We critically reviewed both of Shanks and Weber and Allen and March. With those findings, we further discuss theories on part-whole (or part-of) relationships that are rarely treated in the dispute. As a result, we found three additional evidences that are not sufficiently covered by the dispute. The main focus of the dispute is on the errors of experimental methods: Shanks et al. did not use Bunge's Ontology properly; the refutation of a paradigm shift is lack of concrete, logical rationale; the conceptualization on part-whole relations should be reformed. Conclusively, Allen and March indicate properly issues that weaken the value of Shanks et al. In general, their criticism is reasonable; however, they do not provide sufficient answers how to anchor future studies on part-whole relationships. We argue that the use of the BWW ontology should be rigorously evaluated by its original philosophical rationales surrounding part-whole existence. Moreover, conceptual modeling on the part-whole phenomena should be investigated with more plentiful lens of alternative theories. The criticism on Shanks et al. should not be regarded as a contradiction on evaluating modeling methods of alternative part-whole representations. To the contrary, it should be viewed as a call for research on usable and useful approaches to increase value of conceptual modeling.
Improved Social Network Analysis Method in SNS
Jong-Soo Sohn, Soo-Whan Cho, Kyung-Lag Kwon, and In-Jeong Chung
Vol. 18, No. 4, Page: 117 ~ 127
Keywords : SNS, SNA, Centrality, Closeness, Closeness
Due to the recent expansion of the Web 2.0 -based services, along with the widespread of smartphones, online social network services are being popularized among users. Online social network services are the online community services which enable users to communicate each other, share information and expand human relationships. In the social network services, each relation between users is represented by a graph consisting of nodes and links. As the users of online social network services are increasing rapidly, the SNS are actively utilized in enterprise marketing, analysis of social phenomenon and so on. Social Network Analysis (SNA) is the systematic way to analyze social relationships among the members of the social network using the network theory. In general social network theory consists of nodes and arcs, and it is often depicted in a social network diagram. In a social network diagram, nodes represent individual actors within the network and arcs represent relationships between the nodes. With SNA, we can measure relationships among the people such as degree of intimacy, intensity of connection and classification of the groups. Ever since Social Networking Services (SNS) have drawn increasing attention from millions of users, numerous researches have made to analyze their user relationships and messages. There are typical representative SNA methods: degree centrality, betweenness centrality and closeness centrality. In the degree of centrality analysis, the shortest path between nodes is not considered. However, it is used as a crucial factor in betweenness centrality, closeness centrality and other SNA methods. In previous researches in SNA, the computation time was not too expensive since the size of social network was small. Unfortunately, most SNA methods require significant time to process relevant data, and it makes difficult to apply the ever increasing SNS data in social network studies. For instance, if the number of nodes in online social network is n, the maximum number of link in social network is n(n-1)/2. It means that it is too expensive to analyze the social network, for example, if the number of nodes is 10,000 the number of links is 49,995,000. Therefore, we propose a heuristic-based method for finding the shortest path among users in the SNS user graph. Through the shortest path finding method, we will show how efficient our proposed approach may be by conducting betweenness centrality analysis and closeness centrality analysis, both of which are widely used in social network studies. Moreover, we devised an enhanced method with addition of best-first-search method and preprocessing step for the reduction of computation time and rapid search of the shortest paths in a huge size of online social network. Best-first-search method finds the shortest path heuristically, which generalizes human experiences. As large number of links is shared by only a few nodes in online social networks, most nods have relatively few connections. As a result, a node with multiple connections functions as a hub node. When searching for a particular node, looking for users with numerous links instead of searching all users indiscriminately has a better chance of finding the desired node more quickly. In this paper, we employ the degree of user node vn as heuristic evaluation function in a graph G = (N, E), where N is a set of vertices, and E is a set of links between two different nodes. As the heuristic evaluation function is used, the worst case could happen when the target node is situated in the bottom of skewed tree. In order to remove such a target node, the preprocessing step is conducted. Next, we find the shortest path between two nodes in social network efficiently and then analyze the social network. For the verification of the proposed method, we crawled 160,000 people from online and then constructed social network. Then we compared with previous methods, which are best-first-search and breath-first-search, in time for searching and analyzing. The suggested method takes 240 seconds to search nodes where breath-first-search based method takes 1,781 seconds (7.4 times faster). Moreover, for social network analysis, the suggested method is 6.8 times and 1.8 times faster than betweenness centrality analysis and closeness centrality analysis, respectively. The proposed method in this paper shows the possibility to analyze a large size of social network with the better performance in time. As a result, our method would improve the efficiency of social network analysis, making it particularly useful in studying social trends or phenomena.
Ontology-based User Customized Search Service Considering User Intention
Sukyoung Kim, and Gunwoo Kim
Vol. 18, No. 4, Page: 129 ~ 143
Keywords : Ontology, Search Engine, User-Customized Search
Recently, the rapid progress of a number of standardized web technologies and the proliferation of web users in the world bring an explosive increase of producing and consuming information documents on the web. In addition, most companies have produced, shared, and managed a huge number of information documents that are needed to perform their businesses. They also have discretionally raked, stored and managed a number of web documents published on the web for their business. Along with this increase of information documents that should be managed in the companies, the need of a solution to locate information documents more accurately among a huge number of information sources have increased. In order to satisfy the need of accurate search, the market size of search engine solution market is becoming increasingly expended. The most important functionality among much functionality provided by search engine is to locate accurate information documents from a huge information sources. The major metric to evaluate the accuracy of search engine is relevance that consists of two measures, precision and recall. Precision is thought of as a measure of exactness, that is, what percentage of information considered as true answer are actually such, whereas recall is a measure of completeness, that is, what percentage of true answer are retrieved as such. These two measures can be used differently according to the applied domain. If we need to exhaustively search information such as patent documents and research papers, it is better to increase the recall. On the other hand, when the amount of information is small scale, it is better to increase precision. Most of existing web search engines typically uses a keyword search method that returns web documents including keywords which correspond to search words entered by a user. This method has a virtue of locating all web documents quickly, even though many search words are inputted. However, this method has a fundamental imitation of not considering search intention of a user, thereby retrieving irrelevant results as well as relevant ones. Thus, it takes additional time and effort to set relevant ones out from all results returned by a search engine. That is, keyword search method can increase recall, while it is difficult to locate web documents which a user actually want to find because it does not provide a means of understanding the intention of a user and reflecting it to a progress of searching information. Thus, this research suggests a new method of combining ontology-based search solution with core search functionalities provided by existing search engine solutions. The method enables a search engine to provide optimal search results by inferenceing the search intention of a user. To that end, we build an ontology which contains concepts and relationships among them in a specific domain. The ontology is used to inference synonyms of a set of search keywords inputted by a user, thereby making the search intention of the user reflected into the progress of searching information more actively compared to existing search engines. Based on the proposed method we implement a prototype search system and test the system in the patent domain where we experiment on searching relevant documents associated with a patent. The experiment shows that our system increases the both recall and precision in accuracy and augments the search productivity by using improved user interface that enables a user to interact with our search system effectively. In the future research, we will study a means of validating the better performance of our prototype system by comparing other search engine solution and will extend the applied domain into other domains for searching information such as portal.

Advanced Search
Date Range