DIGITAL LIBRARY ARCHIVE
HOME > DIGITAL LIBRARY ARCHIVE
Journal of Intelligence and Information Systems,
Vol. 25, No. 2, June 2019
|Real-time CRM Strategy of Big Data and Smart Offering System: KB Kookmin Card Case
Vol. 25, No. 2, Page: 1 ~ 23
Keywords : KB credit card, Smart offering system, Bigdata, Real-time CRM, Finance data mining
Big data refers to data that is difficult to store, manage, and analyze by existing software. As the lifestyle changes of consumers increase the size and types of needs that consumers desire, they are investing a lot of time and money to understand the needs of consumers. Companies in various industries utilize Big Data to improve their products and services to meet their needs, analyze unstructured data, and respond to real-time responses to products and services. The financial industry operates a decision support system that uses financial data to develop financial products and manage customer risks. The use of big data by financial institutions can effectively create added value of the value chain, and it is possible to develop a more advanced customer relationship management strategy. Financial institutions can utilize the purchase data and unstructured data generated by the credit card, and it becomes possible to confirm and satisfy the customer's desire. CRM has a granular process that can be measured in real time as it grows with information knowledge systems. With the development of information service and CRM, the platform has change and it has become possible to meet consumer needs in various environments. Recently, as the needs of consumers have diversified, more companies are providing systematic marketing services using data mining and advanced CRM (Customer Relationship Management) techniques.
KB Kookmin Card, which started as a credit card business in 1980, introduced early stabilization of processes and computer systems, and actively participated in introducing new technologies and systems.
In 2011, the bank and credit card companies separated, leading the 'Hye-dam Card' and 'One Card' markets, which were deviated from the existing concept. In 2017, the total use of domestic credit cards and check cards grew by 5.6% year-on-year to 886 trillion won. In 2018, we received a long-term rating of AA + as a result of our credit card evaluation. We confirmed that our credit rating was at the top of the list through effective marketing strategies and services. At present, Kookmin Card emphasizes strategies to meet the individual needs of customers and to maximize the lifetime value of consumers by utilizing payment data of customers. KB Kookmin Card combines internal and external big data and conducts marketing in real time or builds a system for monitoring. KB Kookmin Card has built a marketing system that detects realtime behavior using big data such as visiting the homepage and purchasing history by using the customer card information. It is designed to enable customers to capture action events in real time and execute marketing by utilizing the stores, locations, amounts, usage pattern, etc. of the card transactions.
We have created more than 280 different scenarios based on the customer's life cycle and are conducting marketing plans to accommodate various customer groups in real time. We operate a smart offering system, which is a highly efficient marketing management system that detects customers' card usage, customer behavior, and location information in real time, and provides further refinement services by combining with various apps.
This study aims to identify the traditional CRM to the current CRM strategy through the process of changing the CRM strategy. Finally, I will confirm the current CRM strategy through KB Kookmin card's big data utilization strategy and marketing activities and propose a marketing plan for KB Kookmin card's future CRM strategy. KB Kookmin Card should invest in securing ICT technology and human resources, which are becoming more sophisticated for the success and continuous growth of smart offering system. It is necessary to establish a strategy for securing profit from a long-term perspective and systematically proceed. Especially, in the current situation where privacy violation and personal information leakage issues are being addressed, efforts should be made to induce customers' recognition of marketing using customer information and to form corporate image emphasizing security.
|A Study on Knowledge Entity Extraction Method for Individual Stocks Based on Neural Tensor Network
Vol. 25, No. 2, Page: 25 ~ 38
Keywords : Natural Language Processing, Neural Tensor Network, knowledge Entity, Stock, Artificial Intelligence
Selecting high-quality information that meets the interests and needs of users among the overflowing contents is becoming more important as the generation continues. In the flood of information, efforts to reflect the intention of the user in the search result better are being tried, rather than recognizing the information request as a simple string. Also, large IT companies such as Google and Microsoft focus on developing knowledge-based technologies including search engines which provide users with satisfaction and convenience. Especially, the finance is one of the fields expected to have the usefulness and potential of text data analysis because it’s constantly generating new information, and the earlier the information is, the more valuable it is. Automatic knowledge extraction can be effective in areas where information flow is vast, such as financial sector, and new information continues to emerge. However, there are several practical difficulties faced by automatic knowledge extraction. First, there are difficulties in making corpus from different fields with same algorithm, and it is difficult to extract good quality triple. Second, it becomes more difficult to produce labeled text data by people if the extent and scope of knowledge increases and patterns are constantly updated. Third, performance evaluation is difficult due to the characteristics of unsupervised learning. Finally, problem definition for automatic knowledge extraction is not easy because of ambiguous conceptual characteristics of knowledge.
So, in order to overcome limits described above and improve the semantic performance of stock-related information searching, this study attempts to extract the knowledge entity by using neural tensor network and evaluate the performance of them. Different from other references, the purpose of this study is to extract knowledge entity which is related to individual stock items. Various but relatively simple data processing methods are applied in the presented model to solve the problems of previous researches and to enhance the effectiveness of the model. From these processes, this study has the following three significances. First, A practical and simple automatic knowledge extraction method that can be applied.
Second, the possibility of performance evaluation is presented through simple problem definition. Finally, the expressiveness of the knowledge increased by generating input data on a sentence basis without complex morphological analysis. The results of the empirical analysis and objective performance evaluation method are also presented.
The empirical study to confirm the usefulness of the presented model, experts’ reports about individual 30 stocks which are top 30 items based on frequency of publication from May 30, 2017 to May 21, 2018 are used. the total number of reports are 5,600, and 3,074 reports, which accounts about 55% of the total, is designated as a training set, and other 45% of reports are designated as a testing set. Before constructing the model, all reports of a training set are classified by stocks, and their entities are extracted using named entity recognition tool which is the KKMA. for each stocks, top 100 entities based on appearance frequency are selected, and become vectorized using one-hot encoding. After that, by using neural tensor network, the same number of score functions as stocks are trained. Thus, if a new entity from a testing set appears, we can try to calculate the score by putting it into every single score function, and the stock of the function with the highest score is predicted as the related item with the entity. To evaluate presented models, we confirm prediction power and determining whether the score functions are well constructed by calculating hit ratio for all reports of testing set.
As a result of the empirical study, the presented model shows 69.3% hit accuracy for testing set which consists of 2,526 reports. this hit ratio is meaningfully high despite of some constraints for conducting research. Looking at the prediction performance of the model for each stocks, only 3 stocks, which are LG ELECTRONICS, KiaMtr, and Mando, show extremely low performance than average. this result maybe due to the interference effect with other similar items and generation of new knowledge.
In this paper, we propose a methodology to find out key entities or their combinations which are necessary to search related information in accordance with the user’s investment intention. Graph data is generated by using only the named entity recognition tool and applied to the neural tensor network without learning corpus or word vectors for the field. From the empirical test, we confirm the effectiveness of the presented model as described above. However, there also exist some limits and things to complement.
Representatively, the phenomenon that the model performance is especially bad for only some stocks shows the need for further researches. Finally, through the empirical study, we confirmed that the learning method presented in this study can be used for the purpose of matching the new text information semantically with the related stocks.
|Robo-Advisor Algorithm with Intelligent View Model
Vol. 25, No. 2, Page: 39 ~ 55
Keywords : Robo-Advisor, Mean-Variance Optimization, Intelligent View Model, Black-Litterman Model
Recently banks and large financial institutions have introduced lots of Robo-Advisor products.
Robo-Advisor is a Robot to produce the optimal asset allocation portfolio for investors by using the financial engineering algorithms without any human intervention. Since the first introduction in Wall Street in 2008, the market size has grown to 60 billion dollars and is expected to expand to 2,000 billion dollars by 2020. Since Robo-Advisor algorithms suggest asset allocation output to investors, mathematical or statistical asset allocation strategies are applied. Mean variance optimization model developed by Markowitz is the typical asset allocation model. The model is a simple but quite intuitive portfolio strategy. For example, assets are allocated in order to minimize the risk on the portfolio while maximizing the expected return on the portfolio using optimization techniques. Despite its theoretical background, both academics and practitioners find that the standard mean variance optimization portfolio is very sensitive to the expected returns calculated by past price data. Corner solutions are often found to be allocated only to a few assets.
The Black-Litterman Optimization model overcomes these problems by choosing a neutral Capital Asset Pricing Model equilibrium point. Implied equilibrium returns of each asset are derived from equilibrium market portfolio through reverse optimization. The Black-Litterman model uses a Bayesian approach to combine the subjective views on the price forecast of one or more assets with implied equilibrium returns, resulting a new estimates of risk and expected returns. These new estimates can produce optimal portfolio by the well-known Markowitz mean-variance optimization algorithm. If the investor does not have any views on his asset classes, the Black-Litterman optimization model produce the same portfolio as the market portfolio. What if the subjective views are incorrect? A survey on reports of stocks performance recommended by securities analysts show very poor results. Therefore the incorrect views combined with implied equilibrium returns may produce very poor portfolio output to the Black-Litterman model users.
This paper suggests an objective investor views model based on Support Vector Machines(SVM), which have showed good performance results in stock price forecasting. SVM is a discriminative classifier defined by a separating hyper plane. The linear, radial basis and polynomial kernel functions are used to learn the hyper planes. Input variables for the SVM are returns, standard deviations, Stochastics %K and price parity degree for each asset class. SVM output returns expected stock price movements and their probabilities, which are used as input variables in the intelligent views model. The stock price movements are categorized by three phases; down, neutral and up. The expected stock returns make P matrix and their probability results are used in Q matrix. Implied equilibrium returns vector is combined with the intelligent views matrix, resulting the Black-Litterman optimal portfolio. For comparisons, Markowitz mean-variance optimization model and risk parity model are used. The value weighted market portfolio and equal weighted market portfolio are used as benchmark indexes.
We collect the 8 KOSPI 200 sector indexes from January 2008 to December 2018 including 132 monthly index values. Training period is from 2008 to 2015 and testing period is from 2016 to 2018. Our suggested intelligent view model combined with implied equilibrium returns produced the optimal Black-Litterman portfolio. The out of sample period portfolio showed better performance compared with the well-known Markowitz mean-variance optimization portfolio, risk parity portfolio and market portfolio.
The total return from 3 year-period Black-Litterman portfolio records 6.4%, which is the highest value. The maximum draw down is -20.8%, which is also the lowest value. Sharpe Ratio shows the highest value, 0.17. It measures the return to risk ratio. Overall, our suggested view model shows the possibility of replacing subjective analysts’s views with objective view model for practitioners to apply the Robo-Advisor asset allocation algorithms in the real trading fields.
|A Study on the Strategy of IoT Industry Development in the 4th Industrial Revolution: Focusing on the direction of business model innovation
Vol. 25, No. 2, Page: 57 ~ 75
Keywords : Internet of Things, 4th industrial revolution, Business Model Canvas, 9-Block model, Business Strategy
In this paper, we conducted a study focusing on the innovation direction of the documentary model on the Internet of Things industry, which is the most actively industrialized among the core technologies of the 4th Industrial Revolution. Policy, economic, social, and technical issues were derived using PEST analysis for global trend analysis. It also presented future prospects for the Internet of Things industry of ICT-related global research institutes such as Gartner and International Data Corporation. Global research institutes predicted that competition in network technologies will be an issue for industrial Internet (IIoST) and IoT (Internet of Things) based on infrastructure and platforms.
As a result of the PEST analysis, developed countries are pushing policies to respond to the fourth industrial revolution through cooperation of private (business/ research institutes) led by the government.
It was also in the process of expanding related R&D budgets and establishing related policies in South Korea. On the economic side, the growth tax of the related industries (based on the aggregate value of the market) and the performance of the entity were reviewed. The growth of industries related to the fourth industrial revolution in advanced countries overseas was found to be faster than other industries, while in Korea, the growth of the "technical hardware and equipment" and "communication service" sectors was relatively low among industries related to the fourth industrial revolution. On the social side, it is expected to cause enormous ripple effects across society, largely due to changes in technology and industrial structure, changes in employment structure, changes in job volume, etc. On the technical side, changes were taking place in each industry, representing the health and medical sectors and manufacturing sectors, which were rapidly changing as they merged with the technology of the Fourth Industrial Revolution.
In this paper, various management methodologies for innovation of existing business model were reviewed to cope with rapidly changing industrial environment due to the fourth industrial revolution. In addition, four criteria were established to select a management model to cope with the new business environment: 'Applicability', 'Agility', 'Diversity' and 'Connectivity'. The expert survey results in an AHP analysis showing that Business Model Canvas is best suited for business model innovation methodology.
The results showed very high importance, 42.5 percent in terms of "Applicability", 48.1 percent in terms of "Agility“, 47.6 percent in terms of "diversity" and 42.9 percent in terms of "connectivity." Thus, it was selected as a model that could be diversely applied according to the industrial ecology and paradigm shift.
Business Model Canvas is a relatively recent management strategy that identifies the value of a business model through a nine-block approach as a methodology for business model innovation. It identifies the value of a business model through nine block approaches and covers the four key areas of business: customer, order, infrastructure, and business feasibility analysis. In the paper, the expansion and application direction of the nine blocks were presented from the perspective of the IoT company (ICT).
In conclusion, the discussion of which Business Model Canvas models will be applied in the ICT convergence industry is described. Based on the nine blocks, if appropriate applications are carried out to suit the characteristics of the target company, various applications are possible, such as integration and removal of five blocks, seven blocks and so on, and segmentation of blocks that fit the characteristics.
Future research needs to develop customized business innovation methodologies for Internet of Things companies, or those that are performing Internet-based services.
In addition, in this study, the Business Model Canvas model was derived from expert opinion as a useful tool for innovation. For the expansion and demonstration of the research, a study on the usability of presenting detailed implementation strategies, such as various model application cases and application models for actual companies, is needed.
|Construction and Application of Intelligent Decision Support System through Defense Ontology - Application example of Air Force Logistics Situation Management System
Vol. 25, No. 2, Page: 77 ~ 97
Keywords : Ontology, Decision Support System, Logistics situation management system, Performance Based Logistics, Reliability
The large amount of data that emerges from the initial connection environment of the Fourth Industrial Revolution is a major factor that distinguishes the Fourth Industrial Revolution from the existing production environment. This environment has two-sided features that allow it to produce data while using it. And the data produced so produces another value. Due to the massive scale of data, future information systems need to process more data in terms of quantities than existing information systems. In addition, in terms of quality, only a large amount of data, Ability is required. In a small-scale information system, it is possible for a person to accurately understand the system and obtain the necessary information, but in a variety of complex systems where it is difficult to understand the system accurately, it becomes increasingly difficult to acquire the desired information. In other words, more accurate processing of large amounts of data has become a basic condition for future information systems. This problem related to the efficient performance of the information system can be solved by building a semantic web which enables various information processing by expressing the collected data as an ontology that can be understood by not only people but also computers.
For example, as in most other organizations, IT has been introduced in the military, and most of the work has been done through information systems. Currently, most of the work is done through information systems. As existing systems contain increasingly large amounts of data, efforts are needed to make the system easier to use through its data utilization.
An ontology-based system has a large data semantic network through connection with other systems, and has a wide range of databases that can be utilized, and has the advantage of searching more precisely and quickly through relationships between predefined concepts. In this paper, we propose a defense ontology as a method for effective data management and decision support. In order to judge the applicability and effectiveness of the actual system, we reconstructed the existing air force munitions situation management system as an ontology based system. It is a system constructed to strengthen management and control of logistics situation of commanders and practitioners by providing real - time information on maintenance and distribution situation as it becomes difficult to use complicated logistics information system with large amount of data. Although it is a method to take pre-specified necessary information from the existing logistics system and display it as a web page, it is also difficult to confirm this system except for a few specified items in advance, and it is also time-consuming to extend the additional function if necessary And it is a system composed of category type without search function. Therefore, it has a disadvantage that it can be easily utilized only when the system is well known as in the existing system.
The ontology-based logistics situation management system is designed to provide the intuitive visualization of the complex information of the existing logistics information system through the ontology.
In order to construct the logistics situation management system through the ontology, And the useful functions such as performance - based logistics support contract management and component dictionary are further identified and included in the ontology. In order to confirm whether the constructed ontology can be used for decision support, it is necessary to implement a meaningful analysis function such as calculation of the utilization rate of the aircraft, inquiry about performance-based military contract.
Especially, in contrast to building ontology database in ontology study in the past, in this study, time series data which change value according to time such as the state of aircraft by date are constructed by ontology, and through the constructed ontology, It is confirmed that it is possible to calculate the utilization rate based on various criteria as well as the computable utilization rate.
In addition, the data related to performance-based logistics contracts introduced as a new maintenance method of aircraft and other munitions can be inquired into various contents, and it is easy to calculate performance indexes used in performance-based logistics contract through reasoning and functions. Of course, we propose a new performance index that complements the limitations of the currently applied performance indicators, and calculate it through the ontology, confirming the possibility of using the constructed ontology.
Finally, it is possible to calculate the failure rate or reliability of each component, including MTBF data of the selected fault-tolerant item based on the actual part consumption performance. The reliability of the mission and the reliability of the system are calculated.
In order to confirm the usability of the constructed ontology-based logistics situation management system, the proposed system through the Technology Acceptance Model (TAM), which is a representative model for measuring the acceptability of the technology, is more useful and convenient than the existing system.
|The effect of Big-data investment on the Market value of Firm
Vol. 25, No. 2, Page: 99 ~ 122
Keywords : Big-data investment, firm value, market reaction, efficient market hypothesis, event study
According to the recent IDC (International Data Corporation) report, as from 2025, the total volume of data is estimated to reach ten times higher than that of 2016, corresponding to 163 zettabytes. then the main body of generating information is moving more toward corporations than consumers. So-called “the wave of Big-data” is arriving, and the following aftermath affects entire industries and firms, respectively and collectively. Therefore, effective management of vast amounts of data is more important than ever in terms of the firm.
However, there have been no previous studies that measure the effects of big data investment, even though there are number of previous studies that quantitatively the effects of IT investment. Therefore, we quantitatively analyze the Big-data investment effects, which assists firm’s investment decision making. This study applied the Event Study Methodology, which is based on the efficient market hypothesis as the theoretical basis, to measure the effect of the big data investment of firms on the response of market investors. In addition, five sub-variables were set to analyze this effect in more depth: the contents are firm size classification, industry classification (finance and ICT), investment completion classification, and vendor existence classification. To measure the impact of Big data investment announcements, Data from 91 announcements from 2010 to 2017 were used as data, and the effect of investment was more empirically observed by observing changes in corporate value immediately after the disclosure. This study collected data on Big Data Investment related to Naver 's' News' category, the largest portal site in Korea. In addition, when selecting the target companies, we extracted the disclosures of listed companies in the KOSPI and KOSDAQ market. During the collection process, the search keywords were searched through the keywords 'Big data construction', 'Big data introduction', 'Big data investment', 'Big data order', and 'Big data development'.
The results of the empirically proved analysis are as follows. First, we found that the market value of 91 publicly listed firms, who announced Big-data investment, increased by 0.92%. In particular, we can see that the market value of finance firms, non-ICT firms, small-cap firms are significantly increased. This result can be interpreted as the market investors perceive positively the big data investment of the enterprise, allowing market investors to better understand the company's big data investment. Second, statistical demonstration that the market value of financial firms and non - ICT firms increases after Big data investment announcement is proved statistically. Third, this study measured the effect of big data investment by dividing by company size and classified it into the top 30% and the bottom 30% of company size standard (market capitalization) without measuring the median value. To maximize the difference. The analysis showed that the investment effect of small sample companies was greater, and the difference between the two groups was also clear. Fourth, one of the most significant features of this study is that the Big Data Investment announcements are classified and structured according to vendor status. We have shown that the investment effect of a group with vendor involvement (with or without a vendor) is very large, indicating that market investors are very positive about the involvement of big data specialist vendors.
Lastly but not least, it is also interesting that market investors are evaluating investment more positively at the time of the Big data Investment announcement, which is scheduled to be built rather than completed.
Applying this to the industry, it would be effective for a company to make a disclosure when it decided to invest in big data in terms of increasing the market value.
Our study has an academic implication, as prior research looked for the impact of Big-data investment has been nonexistent. This study also has a practical implication in that it can be a practical reference material for business decision makers considering big data investment.
|A study on the prediction of korean NPL market return
Vol. 25, No. 2, Page: 123 ~ 139
Keywords : Artificial Neural Network, Decision Tree, Genetic Algorithm, Logistic Regression, NPL
The Korean NPL market was formed by the government and foreign capital shortly after the 1997 IMF crisis. However, this market is short-lived, as the bad debt has started to increase after the global financial crisis in 2009 due to the real economic recession. NPL has become a major investment in the market in recent years when the domestic capital market's investment capital began to enter the NPL market in earnest. Although the domestic NPL market has received considerable attention due to the overheating of the NPL market in recent years, research on the NPL market has been abrupt since the history of capital market investment in the domestic NPL market is short. In addition, decision-making through more scientific and systematic analysis is required due to the decline in profitability and the price fluctuation due to the fluctuation of the real estate business.
In this study, we propose a prediction model that can determine the achievement of the benchmark yield by using the NPL market related data in accordance with the market demand. In order to build the model, we used Korean NPL data from December 2013 to December 2017 for about 4 years. The total number of things data was 2291. As independent variables, only the variables related to the dependent variable were selected for the 11 variables that indicate the characteristics of the real estate. In order to select the variables, one to one t-test and logistic regression stepwise and decision tree were performed.
Seven independent variables (purchase year, SPC (Special Purpose Company), municipality, appraisal value, purchase cost, OPB (Outstanding Principle Balance), HP (Holding Period)).
The dependent variable is a bivariate variable that indicates whether the benchmark rate is reached.
This is because the accuracy of the model predicting the binomial variables is higher than the model predicting the continuous variables, and the accuracy of these models is directly related to the effectiveness of the model. In addition, in the case of a special purpose company, whether or not to purchase the property is the main concern. Therefore, whether or not to achieve a certain level of return is enough to make a decision. For the dependent variable, we constructed and compared the predictive model by calculating the dependent variable by adjusting the numerical value to ascertain whether 12%, which is the standard rate of return used in the industry, is a meaningful reference value. As a result, it was found that the hit ratio average of the predictive model constructed using the dependent variable calculated by the 12% standard rate of return was the best at 64.60%.
In order to propose an optimal prediction model based on the determined dependent variables and 7 independent variables, we construct a prediction model by applying the five methodologies of discriminant analysis, logistic regression analysis, decision tree, artificial neural network, and genetic algorithm linear model we tried to compare them. To do this, 10 sets of training data and testing data were extracted using 10 fold validation method. After building the model using this data, the hit ratio of each set was averaged and the performance was compared. As a result, the hit ratio average of prediction models constructed by using discriminant analysis, logistic regression model, decision tree, artificial neural network, and genetic algorithm linear model were 64.40%, 65.12%, 63.54%, 67.40%, and 60.51%, respectively. It was confirmed that the model using the artificial neural network is the best.
Through this study, it is proved that it is effective to utilize 7 independent variables and artificial neural network prediction model in the future NPL market. The proposed model predicts that the 12% return of new things will be achieved beforehand, which will help the special purpose companies make investment decisions. Furthermore, we anticipate that the NPL market will be liquidated as the transaction proceeds at an appropriate price.
|Subject-Balanced Intelligent Text Summarization Scheme
Vol. 25, No. 2, Page: 141 ~ 166
Keywords : Document Summarization, Review Summarization, Text Mining, Topic Modeling, Word Embedding
Recently, channels like social media and SNS create enormous amount of data. In all kinds of data, portions of unstructured data which represented as text data has increased geometrically. But there are some difficulties to check all text data, so it is important to access those data rapidly and grasp key points of text. Due to needs of efficient understanding, many studies about text summarization for handling and using tremendous amounts of text data have been proposed. Especially, a lot of summarization methods using machine learning and artificial intelligence algorithms have been proposed lately to generate summary objectively and effectively which called “automatic summarization”. However almost text summarization methods proposed up to date construct summary focused on frequency of contents in original documents.
Those summaries have a limitation for contain small-weight subjects that mentioned less in original text.
If summaries include contents with only major subject, bias occurs and it causes loss of information so that it is hard to ascertain every subject documents have. To avoid those bias, it is possible to summarize in point of balance between topics document have so all subject in document can be ascertained, but still unbalance of distribution between those subjects remains. To retain balance of subjects in summary, it is necessary to consider proportion of every subject documents originally have and also allocate the portion of subjects equally so that even sentences of minor subjects can be included in summary sufficiently.
In this study, we propose “subject-balanced” text summarization method that procure balance between all subjects and minimize omission of low-frequency subjects. For subject-balanced summary, we use two concept of summary evaluation metrics “completeness” and “succinctness”. Completeness is the feature that summary should include contents of original documents fully and succinctness means summary has minimum duplication with contents in itself. Proposed method has 3-phases for summarization. First phase is constructing subject term dictionaries. Topic modeling is used for calculating topic-term weight which indicates degrees that each terms are related to each topic. From derived weight, it is possible to figure out highly related terms for every topic and subjects of documents can be found from various topic composed similar meaning terms. And then, few terms are selected which represent subject well. In this method, it is called “seed terms”. However, those terms are too small to explain each subject enough, so sufficient similar terms with seed terms are needed for well-constructed subject dictionary. Word2Vec is used for word expansion, finds similar terms with seed terms. Word vectors are created after Word2Vec modeling, and from those vectors, similarity between all terms can be derived by using cosine-similarity.
Higher cosine similarity between two terms calculated, higher relationship between two terms defined. So terms that have high similarity values with seed terms for each subjects are selected and filtering those expanded terms subject dictionary is finally constructed. Next phase is allocating subjects to every sentences which original documents have. To grasp contents of all sentences first, frequency analysis is conducted with specific terms that subject dictionaries compose. TF-IDF weight of each subjects are calculated after frequency analysis, and it is possible to figure out how much sentences are explaining about each subjects.
However, TF-IDF weight has limitation that the weight can be increased infinitely, so by normalizing TF-IDF weights for every subject sentences have, all values are changed to 0 to 1 values. Then allocating subject for every sentences with maximum TF-IDF weight between all subjects, sentence group are constructed for each subjects finally. Last phase is summary generation parts. Sen2Vec is used to figure out similarity between subject-sentences, and similarity matrix can be formed. By repetitive sentences selecting, it is possible to generate summary that include contents of original documents fully and minimize duplication in summary itself.
For evaluation of proposed method, 50,000 reviews of TripAdvisor are used for constructing subject dictionaries and 23,087 reviews are used for generating summary. Also comparison between proposed method summary and frequency-based summary is performed and as a result, it is verified that summary from proposed method can retain balance of all subject more which documents originally have.
|An Empirical Study on Statistical Optimization Model for the Portfolio Construction of Sponsored Search Advertising(SSA)
Vol. 25, No. 2, Page: 167 ~ 194
Keywords : Sponsored Kearch Advertising, CTR, Spillover, Optimization, Keyword Bidding
This research starts from the four basic concepts of incentive incompatibility, limited information, myopia and decision variable which are confronted when making decisions in keyword bidding. In order to make these concept concrete, four framework approaches are designed as follows; Strategic approach for the incentive incompatibility, Statistical approach for the limited information, Alternative optimization for myopia, and New model approach for decision variable.
The purpose of this research is to propose the statistical optimization model in constructing the portfolio of Sponsored Search Advertising (SSA) in the Sponsor’s perspective through empirical tests which can be used in portfolio decision making. Previous research up to date formulates the CTR estimation model using CPC, Rank, Impression, CVR, etc., individually or collectively as the independent variables.
However, many of the variables are not controllable in keyword bidding. Only CPC and Rank can be used as decision variables in the bidding system. Classical SSA model is designed on the basic assumption that the CPC is the decision variable and CTR is the response variable. However, this classical model has so many huddles in the estimation of CTR. The main problem is the uncertainty between CPC and Rank. In keyword bid, CPC is continuously fluctuating even at the same Rank. This uncertainty usually raises questions about the credibility of CTR, along with the practical management problems. Sponsors make decisions in keyword bids under the limited information, and the strategic portfolio approach based on statistical models is necessary.
In order to solve the problem in Classical SSA model, the New SSA model frame is designed on the basic assumption that Rank is the decision variable. Rank is proposed as the best decision variable in predicting the CTR in many papers. Further, most of the search engine platforms provide the options and algorithms to make it possible to bid with Rank. Sponsors can participate in the keyword bidding with Rank. Therefore, this paper tries to test the validity of this new SSA model and the applicability to construct the optimal portfolio in keyword bidding.
Research process is as follows; In order to perform the optimization analysis in constructing the keyword portfolio under the New SSA model, this study proposes the criteria for categorizing the keywords, selects the representing keywords for each category, shows the non-linearity relationship, screens the scenarios for CTR and CPC estimation, selects the best fit model through Goodness-of-Fit (GOF) test, formulates the optimization models, confirms the Spillover effects, and suggests the modified optimization model reflecting Spillover and some strategic recommendations.
Tests of Optimization models using these CTR/CPC estimation models are empirically performed with the objective functions of (1) maximizing CTR (CTR optimization model) and of (2) maximizing expected profit reflecting CVR (namely, CVR optimization model). Both of the CTR and CVR optimization test result show that the suggested SSA model confirms the significant improvements and this model is valid in constructing the keyword portfolio using the CTR/CPC estimation models suggested in this study.
However, one critical problem is found in the CVR optimization model. Important keywords are excluded from the keyword portfolio due to the myopia of the immediate low profit at present. In order to solve this problem, Markov Chain analysis is carried out and the concept of Core Transit Keyword (CTK) and Expected Opportunity Profit (EOP) are introduced. The Revised CVR Optimization model is proposed and is tested and shows validity in constructing the portfolio. Strategic guidelines and insights are as follows; Brand keywords are usually dominant in almost every aspects of CTR, CVR, the expected profit, etc. Now, it is found that the Generic keywords are the CTK and have the spillover potentials which might increase consumers awareness and lead them to Brand keyword. That’s why the Generic keyword should be focused in the keyword bidding.
The contribution of the thesis is to propose the novel SSA model based on Rank as decision variable, to propose to manage the keyword portfolio by categories according to the characteristics of keywords, to propose the statistical modelling and managing based on the Rank in constructing the keyword portfolio, and to perform empirical tests and propose a new strategic guidelines to focus on the CTK and to propose the modified CVR optimization objective function reflecting the spillover effect in stead of the previous expected profit models.