Book Read Free

Digital Marketplaces Unleashed

Page 43

by Claudia Linnhoff-Popien


  Transformation often involves weighting the relevance of words in a text using tf‐idf (term frequency – inverse document frequency). The tf‐idf value reflects the significance of a word in a document, increasing, for example, in proportion with a word’s frequency.

  This refined data is then stored in a structured form that allows data mining and the application of analysis algorithms, including document classification (Sect. 30.3.2), cluster analyses and association analyses [4].

  Fig. 30.2 illustrates the steps involved in the data mining and text mining process.

  Fig. 30.2Steps of data mining and text mining. (Sharafi [4])

  30.3 Sentiment Analysis in Practice

  To better organize the overwhelming quantities of available online content for accurate analysis, it makes sense to classify texts according to sentiment. Sentiment analysis, or sentiment detection, serves a number of purposes, including text summarization, the moderation of online forums and monitoring the acceptance levels of a product or brand by following blog discussions [5]. Sentiment analysis is considerably more complex than traditional subject‐based classification. For example, although the sentence “How can anyone sit through this movie?” does not contain any obviously negative words, it nevertheless hints at a strongly negative sentiment [6].

  Companies from all industry sectors can benefit from sentiment analysis in many different ways, most notably through the increasingly widespread use of customer feedback and review sites for rating films, cars, books, travel, in fact an infinite number of products and services [5].

  Sentiment analysis is applied to social media platforms and online retailers for a range of purposes, such as competitor analysis, trend and market analysis, campaign monitoring, event detection, issues and crisis management, to name but a few.

  Media‐savvy company managers are increasingly recognizing the value of social networking websites as an incredibly rich source of information to identify market opportunities, improve product placement or keep an eye on their competitors. In other words, it allows them to analyze the entire structure of the market.

  Most studies classify sentiment by polarity as either positive, negative or neutral [5–7], usually by applying one of two common approaches: dictionary‐based and machine learning.

  30.3.1 Lexicon‐Based Approaches

  One method of the dictionary‐based approach is word spotting. This technique identifies words or phrases and either compares them with a dictionary database, or uses algorithms to determine their sentiment. To begin with, a sentence is segmented and a part of speech is assigned to each word (POS tagging). Then every word that is considered relevant to the sentiment of the text is categorized by polarity using a dictionary for reference. This procedure is called the dictionary‐based approach [5].

  Words can be analyzed in isolation or in context, by taking valence shifters into consideration. These include negations, intensifiers and diminishers that can reverse, augment or lessen the meaning and thus the polarity value of a sentiment word. If a valence shifter is found in the vicinity of a sentiment word the weighting of this is multiplied with the original value of the word [5].

  The SO‐A approach (semantic orientation from association) allows an automated prediction of polarity based on statistical allocation to a collection of negative and positive sample words [5]. The SO‐A of a word or a sentence is calculated from the difference between its strength of association with a positive set and its strength of association with a negative set [8]. The polarity of the text as a whole is then determined by calculating the average SO‐A value of all the sentences that make up the text. A document is classified by counting the positive or negative words in a text. If the majority of sentiment words in a text are classified as positive, the text as a whole will be regarded as positive. The same applies for negative words. The advantage of word counting as opposed to machine learning approaches is that this method does not require any machine training and can also be applied when no training data is available.

  However, the sheer quantities of information available from different domains render an automated approach to sentiment analysis almost indispensable [5].

  30.3.2 Machine Learning

  What exactly is meant by machine learning? Machine learning has become an intrinsic part of our daily lives. For example, when a customer makes an online purchase, they will then be presented with a range of similar or complementary products. Machine learning is defined as the process of generating models to learn from available data so as to be able to make predictions about future data.

  Microsoft’s cloud‐based Azure Machine Learning Studio, for example, allows the user to create workflows intuitively for machine learning, and there is a wide range of machine learning libraries available for developing prediction solutions. Users can also extend these solutions with their own R or Python scripts [9].

  One of the most frequently used methods of machine learning for classification in the context of data and text mining are naive Bayes, support vector machines and maximum entropy classification [1, 6].

  Naive Bayes

  Naive Bayes is a simple learning algorithm that applies the Bayes rule in combination with the assumption that the features are independent for a given class. Although the independence assumption does not always apply in practice, naive Bayes nevertheless suffices in many situations as a reliable method of classification. Naive Bayes uses the information from a random sample – the training set – to estimate the underlying probability that a document belongs to a class [10].

  The advantages of this approach are that it is more robust for non‐relevant features than other, more complex learning methods. Furthermore, it is a very quick classification technique that requires only a small amount of storage capacity [11].

  Support Vector Machines (SVM)

  Support vector machines are a type of linear algorithm used for classification. In the simplest form of binary classification SVM finds a hyperplane that separates the two classes of the dataset with as wide a gap as possible [12]. The number of separating hyperplanes is infinite in linearly separable training sets. All hyperplanes separate the training set accurately into two classes, but with a variety of results. The optimum separation shows a clear gap between the categories. In other words, the margin between the groups of data points should be as wide as possible.

  The hyperplane which separates the classes from one another can be pinpointed from a small number of data points that lie on the margin. These are referred to as support vectors. First these support vectors are identified, then the margin is maximized and the separating hyperplane determined [11]. In practice, however, many problems in data analysis are described with non‐linear dependencies. The SVM technique can still be applied in such cases by simply adding kernel functions. The advantages of the SVM method lie in its high accuracy, flexibility, robustness and efficiency [12].

  Maximum Entropy Classification

  The entropy of a word is a quantitative measurement of its information content. For example, the word “beautiful” has a higher weighting than the word “nice”. In contrast to the naive Bayes classifier, maximum entropy classification uses weighted properties. It is assumed that properties with a higher weighting also classify the training set most accurately. However, relationships between words are disregarded entirely [13]. From a set of models that are consistent with the observation values the classifier seeks to find an optimum that maximizes the entropy. A higher entropy indicates a higher uncertainty in the probability distribution. The theory behind this is that every other model whose entropy value is not maximum assumes values that have not been considered [14].

  30.4 Machine Learning – Inspired by Biology

  Computers, or rather classical computer programs, have to know everything. Bits
and bytes are being processed, stored and retrieved every millisecond. The results can be reproduced at any time and traced logically. The computer strictly follows its instructions to the end. Although we have come a long way in the last 60 years with this type of programming, we are now beginning to encounter problems and issues that demonstrate that this classical type of information acquisition can be unmanageable, and might even become antiquated and useless.

  The time has come for new concepts: software that writes software, algorithms that adapt, optimize themselves and can even predict future results. On the face of it machines do just what we tell them to do. But many problems nowadays are too complex, and cannot be described with exact instructions. We are familiar with a multitude of survival strategies from the natural world. Nature constantly offers us new ways of confronting and mastering difficult situations. The question is therefore: How can machines be created that do exactly what is needed without having to be instructed [15]?

  This takes us into the field of artificial intelligence or, more precisely “computational intelligence”. Inspired by biology, this field comprises three areas of information processing. These technologies offer mechanisms of biological problem‐solving strategies for mathematical or engineering problems, so that these can be made useful. The challenge is to derive a general rule from a quantity of data without explicitly instructing the system what rules to apply for the classification. Using large quantities of data a software construction trains computers to accurately interpret other data quantities autonomously. Although they are trained to learn predictable behavior, the system does not allow any insights into the learned approaches. This is where the analogy to the biological example becomes clear.

  Computational intelligence is based on algorithms of fuzzy logic [16], neural networks [17] and evolutionary algorithms [18]. These specialist fields often overlap. Evolutionary algorithms, for example, are used for the design of neural networks or fuzzy systems, or even neuro fuzzy systems [19], which on the one hand are visualized in the more comprehensible fuzzy form and on the other hand use the efficient learning behavior of neural networks.

  30.4.1 The Origins of Machine Learning

  The Dartmouth Summer Research Conference on Artificial Intelligence in Hanover, New Hampshire, is widely acknowledged as the birthplace of artificial intelligence research. It was in the summer of 1956 that John McCarthy organized a 10‐person, two‐month workshop to study neural networks, automata theory and intelligence in general. Although the workshop itself did not present any new findings, the participants were able to agree on a name for their new field of research. Besides McCarthy the attendees included Marvin Minsky and Claude Shannon who, over the next 20 years, came to dominate the field of artificial intelligence [20].

  John McCarthy’s core idea was “that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.” Following the Dartmounth Conference the newly established AI community set to work with great optimism and developed the first approaches to solving puzzles, logical reasoning and games like chess [20–22].

  Groundbreaking theoretical preliminary work had been carried out several years earlier, however, by Alan Turing. According to Turing we speak of “artificial intelligence” when an algorithm successfully solves a problem using human‐like responses, so that an evaluator is unable to distinguish whether the answer comes from a human or a machine [23].

  30.4.2 The 60s and 70s

  The 1960s saw the first programs to demonstrate intelligent behavior. In computer games like chess or draughts these programs were already following strategies that enabled them to beat human opponents. Research had brought computers to a level where they were able to develop their own solutions to a problem. Soon after, pattern recognition procedures were being explored for image and speech processing, with systems learning to process simple commands of natural speech and recognize patterns in images. These developments then flowed into industrial robot control. In spite of the huge expectations, AI research made slow progress, which eventually led to several major sponsors withdrawing their support. This period came to be known among the AI community as the first “AI winter” [20, 24].

  30.4.3 to the Present Day

  From 1980 onwards, however, AI research enjoyed fresh impetus – both financially and conceptionally. Greater emphasis on mathematization, neural networks, which had been somewhat neglected in the last 20 years, and multi‐agent systems – also referred to as distributed AI – shaped the next phase of research. Expert systems had by this time reached a level that made them attractive for industrial applications [20].

  Based on the findings of the last few years, where a system “only” had to respond to problems, these approaches were now applied to robots which, equipped with a body of their own, came to be perceived as autonomous beings. The obvious progression was to add rudimentary actions such as autonomous movement and facial expressions. A well‐known example was the Mars robot, Sojourner, which landed on Mars in 1997 and explored its surroundings on wheels. Since then, scientists and other tech‐savvy hobbyists have organized competitions to show off their skills. One of the most famous competitions is the RoboCup, organized by the Federation of International Robot‐soccer Association and first held in South Korea in 1996 (FIRA), where soccer‐playing robots compete against one another in different categories [25].

  Even if it at first seems like trivial amusement, Marvin Minsky once described the problems of AI research quite aptly: AI researchers have worked to solve those problems we humans find difficult, such as chess, but they didn’t make any progress on the problems that humans find easy [20].

  30.5 Learning Methods

  Machine learning therefore offers automated and precise predictions from dense, disordered information and converts this into a format that is useful to humans. Depending on the purpose of a machine learning system or algorithm, these models recommend future actions based on so called empirical values or probabilities. However, the learning process is crucial for a system to be able to make predictions. This process always follows the following sequence:

  To begin with, the model is trained using sample data. It calculates a task, and the result is compared with the desired result. The target vs. actual difference is then returned to the model and recalculated according to a suitable procedure (e. g. the gradient descent of the back propagation algorithm) in order to reduce the error to a pre‐defined minimum.

  By continually adding data and empirical knowledge a machine learning system can be trained to visualize the predictions for a specific use case even more accurately [26].

  30.6 Implementation Options

  There are many different options for implementing machine learning. Besides the various methods – which can also be applied in combination – the question arises: Which is the most suitable programming language, which frameworks or tools lend themselves to the task?

  In the science sector, script and standard languages have prevailed. Instead of the classical C/C++ programming, prototypical developments in Matlab, Python, Julia or R are often deployed. For this reason the larger frameworks offer interfaces to further programming languages, so that the methods of these machine learning libraries can be used in combination with the user’s own preferred programming languages. The largest overlap can be achieved with the GPL script language Python, which works very well in different environments and can be extended with the preferred programming languages (C/C++, Java, C#, etc.).

  In this context there are various prefabricated frameworks. These can be implemented in many different ways. Thanks to documentation and numerous user boards, among other things, pre‐fabricated frameworks are relatively easy to get started with, compared with single‐handed implementation [27].
/>
  Acceleration/Parallelization

  To meet the growing demand for more and more processing power, huge advances have been made in the development of high‐performance computer systems, along with faster and more reliable communication networks. At present these systems work at rates of several gigahertz and communicate with one another through transmission networks at several gigabits per second. But existing technology may eventually no longer be able meet our demand for data. Physical constraints like the speed of light mean that researchers will have to start looking for alternatives. The parallelization of data offers one solution.

  30.7 The Current Situation and Trends

  Current trends point to an increased outsourcing of IT structures to the cloud. The cloud, or cloud computing, offers a variety of service models: from complete infrastructures with virtual computers that can communicate with one another in virtual networks to the more basic software‐as‐a‐service (e. g. Microsoft Office 365). Furthermore, clouds offer an enormous computing capacity, since they are not constrained by the limitations of a local server or network. Instead, cloud computing offers every network‐compatible device access to an almost unlimited pool of processing and storage resources.

  Beyond advancing performance, researchers are exploring the development of new models for practical applications.

  One major area of application for machine learning is predictive maintenance in the context of Industry 4.0 and the smart factory. ThyssenKrupp, CGI and Microsoft, for example, have together developed thousands of sensors and systems to network elevators directly to the cloud. Data from the elevators is fed into dynamic prediction models, giving Microsoft Azure ML uninterrupted access to current datasets. The data are then transferred to a dashboard, where continuously updated KPIs (Key Performance Indicator) [4] can be identified live on computers and mobile devices. This allows ThyssenKrupp to monitor things like elevator speed and door operation around the clock with minimal effort and expenditure [28].

 

‹ Prev