November 2018 - Page 4 of 21 - Mobile Monitoring Solutions

Uncategorized

The Story behind Very Fast iPhone XS JavaScript Performance

MMS • RSS

Article originally posted on InfoQ. Visit InfoQ

Initial reviews of JavaScript performance benchmarks show the iPhone XS and iPhone XS max having better performance than even an iMac Pro with the Speedometer 2.0 benchmark, which compares performance in real-world framework loading scenarios.

David Heinemeier Hansson, creator of Ruby on Rails, founder & CTO at Basecamp, started a discussion on Twitter:

The iPhone XS is faster than an iMac Pro on the Speedometer 2.0 JavaScript benchmark. It’s the fastest device I’ve ever tested. Insane 45% jump over the iPhone 8/X chip. How does Apple do it?!

This tweet led to significant speculation on how this performance improvement is possible.

Rafael Oliveira, CTO of curiosity.ai, remarked that Apple seems to optimize their JS compiler for ARM far more than they do for the x86 found in Mac desktops:

But that’s the point – Apple doesn’t seem to optimize the x86 Safari that much anymore, but their ARM JS compiler gets a lot of love… Not saying their CPUs aren’t impressive, just that the difference seems to be exaggerated when benchmarking JS.

Software engineer Greg Parker pointed out that the latest ARM instruction sets, ARMv8.3-A, make specific improvements to JavaScript performance:

ARMv8.3 adds a new float-to-int instruction with errors, and out-of-range values handled the way that JavaScript wants. The previous instructions to get JavaScript’s semantics were much slower. JavaScript’s numbers are double by default, so it needs this conversion a lot.

ARM did indeed release many updates known as ARMv8.3-A, including improvements to JavaScript data type conversion:

JavaScript uses the double-precision floating-point format for all numbers. However, it needs to convert this common number format to 32-bit integers in order to perform bit-wise operations. Conversions from double-precision float to integer, as well as the need to check if the number converted really was an integer, are therefore relatively common occurrences.

The ARMv8.3-A instructions help convert a double-precision floating-point number to a signed 32-bit integer to improve performance.

However, it turns out that this was not the reason for the performance improvement, as software engineer Saam Barati explains that iOS 12 Safari does not yet leverage these instructions.

Since the initial reports, an improvement to leverage ARMv8.3-A instructions has landed in WebKit nightlies and is expected in future versions of Safari. Most notably, Barati notes improvements to performance in various JetStream2 test results:

15% better on stanford-crypto-aes
30% better on stanford-crypto-pbkf2
97% better on stanford-crypto-sha256

Overall, the fix is expected to add an overall 0.5-2% performance improvement beyond the current version of Safari without support for ARMv8.3-A.

Per a detailed iPhone XS performance report from Anandtech, the most likely reason for the significant improvement in JavaScript performance is the new memory subsystem:

403.gcc partly, and most valid for 429.mcf, 471.omnetpp, 473.Astar and 483.xalancbmk are sensible to the memory subsystem and this is where the A12 just has astounding performance gains from 30 to 42%. It’s clear that the new cache hierarchy and memory subsystem has greatly paid off here as Apple was able to pull off one of the most major performance jumps in recent generations.

So while the ARM improvements are useful to computationally expensive JavaScript operations, they are not as significant as the memory subsystem improvements.

As a result, many are asking when Apple will switch from Intel to ARM for future desktop and laptop Macs, and will this result in a similar improvement over today’s performance?

Uncategorized

The Two (Conflicting) Definitions of AI

MMS • RSS

Article originally posted on Data Science Central. Visit Data Science Central

Summary: There are two definitions currently in use for AI, the popular definition and the data science definition and they conflict in fundamental ways. If you’re going to explain or recommend AI to a non-data scientist, it’s important to understand the difference.

For a profession as concerned with accuracy as we are, we do a really poor job at naming things, or at least being consistent in the naming. “Big Data” – totally misleading (since it incorporates velocity and variety in addition to volume). How many times have you had to correct someone on that?

And look back at all the things we’ve called ourselves since the late 90’s. These names don’t describe different outcomes or even really different techniques. We’re still finding the signal in the data with supervised and unsupervised machine learning.

So now we have Artificial Intelligence (AI) for which there are at least two competing definitions, the popular one and the one understood by data scientists. And that doesn’t even account for the dozens of Venn diagrams trying to describe which is a subset of what and all basically in conflict.

I’m sure by now you’ve heard the old joke. What’s the definition of AI?

When you’re talking to a customer it’s AI.

When you’re talking to a VC it’s machine learning.

When you’re talking to a data scientist it’s statistics.

It would be even funnier if it weren’t true, but it is.

So it’s a worthwhile conversation to go directly at these two definitions and see where they conflict, and where if anywhere they converge.

The Popular Definition

This definition got underway 12 or 18 months ago and seems to have unstoppable momentum. In my opinion that’s too bad since it’s misleading in many respects. Gathered from a variety of sources and distilled here the popular definition of AI is:

Anything that makes a decision or takes an action that a human used to take, or helps a human make a decision or take an action.

The main problem with this is that it describes everything we do in data science including every technique of machine learning we’ve been using since the 90s.

As I gathered up different versions of this to distill for you here it became apparent that there are four different groups promoting this meme.

AI Researchers: They’re getting all the press and they want to claim ‘machine learning’ as something unique to AI.
The Popular Press: They’re just confused and can’t tell the difference.
Customers: Who increasingly ask ‘give me some of that AI’.
Platform and Analytics Vendors: If customers want AI then we’ll just call everything AI and everyone will be happy.

The Data Scientist’s Definition

Those of us professionally involved in all these techniques know that a set of new or expanded techniques evolved over the last ten years. These included deep neural nets and reinforcement learning.

These aren’t radically new techniques since they grew out of neural nets that had been in our toolbox for a long time but blew up on the steroids of MPP (massive parallel processing brought by NoSQL Hadoop), GPUs, and vastly expanded cloud compute.

When you looked at these from the perspective of the AI founders like Turing, Goertzel, and Nilsson you could see these newly expanded capabilities as the eyes, ears, mouth, hands, and cognitive ability that started to add up to their vision of what artificial intelligence was supposed to be able to do.

Data scientists understand that the definition of AI as we practice it today is really a collection of the six unique techniques above, some more advanced toward commercial readiness than others.

Is There Any Common Ground

It’s narrow, but there is some common ground between these two definitions. That’s primarily in the backstory for AI. The popular press has mostly represented that AI is something brand new but the correct way to look at this is as an evolution over time.

I think we all understand that we stand on the shoulders of those who came before. Even as far back as the 90’s we were building hand crafted decision trees that we called expert systems to take the place of human decision making in complex situations.

Once you understand that the popular definition wants to include everything that makes a decision, then it’s easy to see the progression through machine learning and Big Data into deep learning.

One place where the casual reader needs to be careful though is in understanding what elements of AI are commercially ready. Among the six techniques or technologies that make up AI, only CNNs and RNN/LSTMs for image, video, text, and speech are at commercially acceptable performance levels.

What you may need to explain to your executive sponsors is that these six ‘true’ AI methods are still the bleeding edge of our capabilities. Projects based on these are high cost, high effort, and higher risk.

The conclusion ought to be that there are many business solutions that can be based on machine learning without involving true AI methods. As more third party vendors create industry or process specific solutions using these new techniques this risk will become less, but that’s not today.

For the rest of us, the conflict of definitions remains. When someone asks you about AI, you’re still going to need to ask ‘what do you mean by that’.

Other articles by Bill Vorhies.

About the author: Bill Vorhies is Editorial Director for Data Science Central and has practiced as a data scientist since 2001. He can be reached at:

Bill@DataScienceCentral.com or Bill@Data-Magnum.com

Uncategorized

Feedback for Skill Development and Careers

MMS • RSS

Article originally posted on InfoQ. Visit InfoQ

Feedback and continuous learning are crucial for personal and professional development. Non-technical skills like creative problem solving, critical thinking, and an entrepreneurial mindset are important to make progress in your career. You have to own your career direction and know what you ultimately want to be in order to decide on the next steps.

In the morning sessions at Women in Tech Dublin 2018, presenters spoke about how to use feedback to develop your skills and work on your career.

Caroline O’Reilly, senior director of engineering at Workday, presented rewiring yourself for success. Feedback is important to learn how you are doing and develop yourself, she said. Instead of using the word “feedback” which can be perceived as stressful or negative, she prefers to call it “advice”, emphasizing the positive aspects, the aim of it which is to get better.

Trust and empathy are essential in a great team. People in teams have to share the same values, and you have to hire for values, O’Reilly said. She argued that you have to build a great culture and diversity from the beginning; you can’t retrospectively build it in.

Julie Spillane, managing director at Accenture, explored how to apply the principles of innovation to our own career development in her talk Fuelling Your Journey. Professionals need to work on their career as the demand for technical skills is changing fast. People need to have new skills which didn’t exist five years ago, said Spillane. Also, non-technical skills are important; Spillane mentioned skills like creative problem solving, critical thinking, and an entrepreneurial mindset.

Spillane said that the future will bring us more of a flexible workforce and career models. Her advice for professionals is to prepare themselves by focusing on the mind, voice, and heart. Professionals are expected to have a growth mindset, the belief that fundamental ability can be developed over time through dedication. She suggested to seek out opportunities to learn and develop yourself. Don’t see risk-taking as a negative thing or problem-solving; think of it as an opportunity, she said.

Regarding voice, she emphasized to the audience to be authentic, to make your voice heard and know your story. You have to know the pivotal moments in your career, where it has gone wrong, and what has gone really well.

At the heart of it, as technology becomes more ingrained it’s important to work on the connection between people. Seek out people who are different from you; it’s a great way to learn, she said. Look out for each other, help each other to find your voice, and build your skills to reach full potential, was her advice.

Sarah Cunningham, vice president of technology, Dublin technology hub at MasterCard, spoke about owning your career. She started her talk by stating, “If you don’t own your career, nobody else will.” You have to own your direction and know your north star, what would you ultimately want to be. A north star is a movable goal, she said; it’s ok to change your direction along the way.

Professionals need to invest in continuous learning. They have to be open to new pathways and dare to take them, argued Cunningham. She suggested to figure out what skills you need to get from here to there, and constantly refresh this list to stay relevant as new technologies and skillsets emerge.

If there’s a committee or cause at work that you are passionate about, step up to it, she suggested. If you are to busy now, revisit your decision when you have time. If you want to have a great peer mentor, be a great mentor, it works two ways, she said. Not every career move needs to be upward; sometimes a lateral move can take you further.

Feedback is crucial for personal and professional development, said Cunningham. She warned to not fall into the trap of fixating over every bit of feedback you receive, by distinguishing the “value add” from “feedback for the sake of feedback”.

InfoQ is covering Women in Tech Dublin with summaries and Q&As.

Uncategorized

Article: Sentiment Analysis: What's with the Tone?

MMS • RSS

Article originally posted on InfoQ. Visit InfoQ

Key Takeaways

A recent trend in the analysis of texts goes beyond topic detection and tries to identify the emotion behind a text. This is called sentiment analysis, or also opinion mining and emotion AI.
Sentiment analysis is widely applied in voice of the customer (VOC) applications as in analyzing responses in a questionnaire or free comments in a review.
Extracting sentiment from a text can be done using techniques like natural language processing, computational linguistics, and text mining.
Text mining can be performed using Machine Learning (ML) or a lexicon based approach.
Lexicon-based approach relies on the words in the text and the sentiment they carry. This technique uses NLP concepts and a dictionary to extract the tone of the conversation.
ML-based approach needs a sentiment-labeled collection of documents; this is a collection in which each document has been manually evaluated and labeled in terms of sentiment. After some preprocessing, an ML-supervised algorithm is trained to recognize the sentiment in each text.

Besides understanding what people are talking about, it is sometimes important to understand the tone of the conversation.

A relatively more recent trend in the analysis of text goes beyond topic detection and tries to identify the emotion behind a text. This is called sentiment analysis or also opinion mining and emotion AI.

For example, the sentence “I love chocolate” is very positive in regard to chocolate as food. “I hate this new phone” also gives a clear indication of the customer preferences about the product. In these two particular cases, the words “love” and “hate” carry a clear sentiment polarity. A more complex case could be the sentence “I do not like the new phone,” where the positive polarity of “like” is reversed into a negative polarity by the negation. The same is true for “I do not dislike chocolate,” where the negation of a negative word such as “dislike” creates a positive sentence.

The polarity of a word can be context dependent. “These mushrooms are edible” is a positive sentence in regard to health. However, “This steak is edible” is a negative sentence when referring to a restaurant. At times, the polarity of a word is delimited in time, such as “I like to travel, sometimes,” where “sometimes” limits the positive polarity of the word “like.” And so on to even more subtle examples like “I do not think this chocolate is really great” or even worse “Do you really think this concert was so fantastic?”

We have talked here about positive and negative sentiment. However, positive and negative are not the only labels you can use to define sentiment in a sentence. Usually the whole range — very negative, negative, neutral, positive and very positive — is used. Additional, less obvious labels may also be used, like irony, understatement, uncertainty, etc.

A typical use case is feedback analysis. Depending on the tone of the feedback — upset, very upset, neutral, happy and very happy — the feedback takes a different path in a support center.

Sentiment analysis is indeed widely applied in voice of the customer (VOC) applications. For example, when analyzing responses in a questionnaire or free comments in a review, it is extremely useful to know the emotion behind them in addition to the topic. A disgruntled customer will be handled in a different way from an enthusiastic advocate. From the VOC domain, the step to applications for healthcare patients or for political polls is quite short.

Similarly, the number of negative vs. positive comments can decide the future of a YouTube video or a Netflix movie.

How can we extract sentiment from a text? Sometimes even humans are not that sure of the real emotion when reading between the lines. Even if we manage to extract the feature associated with sentiment, how can we measure it? There are a number of approaches to do that, involving Natural Language Processing (NLP), computational linguistics, and text mining. We will focus in this article the two most common approaches: a machine learning (ML) approach and a lexicon-based approach.

Sentiment Analysis

A relatively more recent trend in the analysis of texts goes beyond topic detection and tries to identify the emotion behind a text. This is called sentiment analysis, or also opinion mining and emotion AI.

For example, the sentence “I love chocolate” is very positive with regards to chocolate as food. “I hate this new phone” also gives a clear indication of the customer preferences about the product. In these two particular cases, the words “love” and “hate” carry a clear sentiment polarity. A more complex case could be the sentence “I do not like the new phone”, where the positive polarity of “like” is reversed into a negative polarity by the negation. The same for “I do not dislike chocolate”, where the negation of a negative word like “dislike” brings a positive sentence.

Sometimes the polarity (i.e. positivity or negativity) of a word is context dependent. “These mushrooms are edible” is a positive sentence with regards to health. However, “This steak is edible” is a negative sentence with regards to a restaurant. Sometimes the polarity of a word is delimited in time, like “I like to travel, sometimes.” where sometimes limits the positive polarity of the word “like”. And so on to even more subtle examples like “I do not think this chocolate is really great” or even worse “Do you really think this concert was so fantastic?”.

We have talked here about positive and negative sentiment. However, POSITIVE and NEGATIVE are not the only labels you can use to define the sentiment in a sentence. Usually the whole range VERY NEGATIVE, NEGATIVE, NEUTRAL, POSITIVE, and VERY POSITIVE is used. Sometimes, however, additional less obvious labels are also used, like IRONY, UNDERSTATEMENT, UNCERTAINTY, etc.

How can we extract sentiment from a text? Sometimes even humans are not that sure of the real emotion in between the lines. Even if we manage to extract the feature associated with sentiment, how can we measure it? There are a number of approaches to do that, involving natural language processing, computational linguistics, and finally text mining. We will concern ourselves here with the text mining approaches, which are mainly two: a Machine Learning (ML) approach and a lexicon based approach.

The lexicon-based approach relies on the words in the text and the sentiment they carry. This technique uses NLP concepts and a dictionary to extract the tone of the conversation.

The ML-based approach needs a sentiment-labeled collection of documents; this is a collection in which each document has been manually evaluated and labeled in terms of sentiment. After some preprocessing, an ML-supervised algorithm is trained to recognize the sentiment in each text.

What You Need

KNIME Analytics Platform – We’ll use KNIME data analysis tools to show how to develop a sentiment analysis solution.

KNIME Analytics Platform is an open source software for data science for data scientists, data analysts, big data users, and business analysts. It covers all your data needs from data ingestion and data blending to data visualization, from machine learning algorithms to data wrangling, from reporting to deployment, and more. It is based on a Graphical User Interface for visual programming, which makes it very intuitive and easy to use, considerably reducing the learning time.

It has been designed to be open to different data formats, data types, data sources, data platforms, as well as external tools, like Apache Tika libraries, Keras and Python for example. It also includes a number of extensions for the analysis of unstructured data, like texts or graphs.

For text processing, the KNIME Text Processing extension offers a wide variety of IO, cleaning, processing, stemming, keyword extraction, and more text processing related nodes.

Given all those characteristics – open source, visual programming, and codeless Text Processing integration – we have selected it to implement a sentiment analysis application.

Computing units in KNIME Analytics Platform are small colorful blocks, named “nodes”. Assembling nodes in a pipeline, one after the other, implements a data processing application. A pipeline is called “workflow” (Figure 1).

KNIME Analytics Platform is open source. It can be downloaded and used for free.

Download the installable package for your operating system from here. Then install it, following these video instructions:

The IMDb data set – To evaluate sentiment in sentences or texts, we need some examples, of course. We used here the data set of movie reviews provided by IMDb. This data set includes 50,000 movie reviews, each one manually labeled for sentiment. Sentiment classes are balanced: 25,000 are negative reviews and 25,000 are positive reviews.

If we pursue the NLP lexicon-based approach, we also need a dictionary of words with the sentiment they carry; that is at least a list of negative words and a list of positive words. We got those lists from the MPQA Corpus.

If we pursue the machine learning-based approach, we need a sentiment label for each of our text examples. The IMDb data set provides a positive vs. negative label, manually evaluated for each review.

The Workflows

NLP-based Sentiment Analysis – The workflow for the lexicon-based sentiment analysis needs to:

Clean and standardize the texts in the document collection.
Tag all words as positive or negative according to the dictionary lists provided by the MPQA Corpus. (To do that we use the dictionary tagger node twice. All other words are removed.)
Extract all remaining words from each document with a Create BoW node.
Calculate the sentiment score for each document as:
- Sentiment score = (# positive words – # negative words) / (# words in document)
Define a threshold value as the average sentiment score.
Subsequently classify documents as:
- positive if sentiment score > threshold
- negative otherwise
If you want to be more cautious, you can define positive and negative thresholds as:
- thresholds = avg (sentiment score) ± stddev (sentiment score)

Thus, all documents with sentiment score in between the two thresholds can be classified as neutral.

Figure 1. NLP lexicon-based approach to sentiment analysis. Here you need a set of NLP-based rules, which in the simplistic case is based on just two lists of words: one list for positive words and one list for negative words. Here the two lists of words are taken from the MPQA Corpus.

If we assign colors to the IMDb reviews according to the predicted sentiment — green for positive and red for negative sentiment — we get the result table in Figure 2.

Note: This is a very crude calculation of the sentiment score. Of course, more complex rules could be applied, for example, inverting the word sentiment polarity after a negation and taking into account the time evolution of the sentence.

Figure 2. Movie reviews with predicted sentiment from NLP-based approach: red for negative and green for positive sentiment.

The workflow in Figure 1, with just a sample of the original IMDb data set, is downloadable for free from the KNIME examples under

Other Analytics Types/Text Processing/Sentiment Analysis Lexicon based Approach

or within KNIME Analytics Platform in the EXAMPLES list under:

08_Other_Analytics_Types/01_Text_Processing/26_Sentiment_Analysis_Lexicon_Based_Approach (see video on how to connect and download workflows from the KNIME EXAMPLES Server).

ML-based Sentiment Analysis – The application implementing the machine learning-based approach to sentiment analysis needs to:

Again clean and standardize the texts in the documents.
Extract all words from the documents with a Create BoW node.
Produce a text vectorization of the document with a Document Vector node.
Train a ML algorithm to recognize positive vs. negative texts.
Evaluate the created model on test documents.

Note: The training and test phase is implemented exactly as in any other machine learning-based analysis. Here we used a decision tree because the data set is quite small, but any other supervised ML algorithm could be used: deep learning, random forest, SVM or any other.

Figure 3. Machine learning-based approach for sentiment analysis. Here we train a decision tree, but any other supervised ML model for classification could be used.

Instead of extracting all words from the text, in the interest of execution speed and input table size, we could extract only the main keywords with one of the keyword extractor nodes. Keyword extraction, however, could limit the original word set and cut out important sentiment-related words, which might lead to diminished performance in terms of sentiment classification. If you decide to go with a smaller set of words, just make sure that performances are not terribly affected.

The Document Vector node, following the preprocessing stage, performs a one-hot encoding of the input texts, without preserving the order of the words in the sentence.

The Scorer node at the end of the workflow calculates a number of accuracy measures on the test set produced earlier on by the Partitioning node. The final accuracy is around 71 percent, and the Cohen’s kappa is 0.42 based on a random stratified partition of 70 percent vs. 30 percent of the original data respectively for training set and test set.

Usually, the machine learning-based approach performs better than the dictionary-based approach, especially when using the simple sentiment score adopted in our NLP approach. However, sometimes there is no choice because a sentiment-labeled data set is not available.

This workflow is downloadable for free from the KNIME examples under

Other Analytics Types/Text Processing/Sentiment Classification

or within KNIME Analytics Platform in the EXAMPLES list under:

08_Other_Analytics_Types/01_Text_Processing/03_Sentiment_Classification

(see video on how to connect and download workflows from the KNIME EXAMPLES Server)

Deployment

Deployment workflow for an NLP-based sentiment analysis is practically the same as the training workflow in Figure 1. The only difference occurs in the threshold calculation. Indeed, the threshold is calculated only in the training workflow on the training set and then just adopted in the deployment workflow.

The deployment workflow for a machine learning-based sentiment analysis looks like any other ML-based deployment workflow. Data are imported and preprocessed as needed, the model is acquired, and data are fed into the model to produce predictions that are presented to the end user.

Word Sequences and Deep Learning

The text vectorization used in the ML-based approach transforms words into vectors of 0/1 (one-hot encoding), where 1 shows the presence of a word and 0 its absence. The time sequence of the words in the sentence is not necessarily preserved. This is also acceptable as long as we use ML algorithms that do not take the sequence order into account, for example, a decision tree.

A variation of the one-hot encoding is the frequency based encoding, where instead of using 0/1 for absence /presence of the word, the word frequency is used for word presence and 0 again for word absence.

Another type of encoding is index-based. In this case a word is coded by means of an ID, usually a progressive integer index.

One special machine learning algorithm that works well for sentiment analysis is a deep learning network with a Long Short-Term Memory (LSTM) layer. Indeed, Recurrent Neural Networks (RNN) and especially LSTM networks have been recently used to explore the dynamic of time series evolution. We could use them to explore the dynamic of word sequence to better predict text sentiment as well.

LSTM units and layers are available in the KNIME Analytics Platform through the KNIME Deep Learning Extension – Keras Integration.

The KNIME Deep Learning Extension integrates deep learning functionalities, networks and architectures from TensorFlow and Keras in Python. Even though this extension allows you to write Python code to run the TensorFlow/Keras libraries, it also allows you to assemble, train and apply Keras networks through the traditional KNIME Graphical User Interface, based on nodes, drag and drops, clicks, configuration windows, and traffic light status. This last option makes the whole assembling and training process codeless and, therefore, much easier and faster, especially for prototyping and experimentation. A mix and match approach is also possible.

Indeed, to build a deep learning network with an input layer, an embedding layer, an LSTM layer, and a dense layer, we just need four nodes:

Keras Input Layer,
Keras Embedding Layer,
Keras LSTM Layer, and
Keras Dense Layer nodes (Figure 4).

The network training is obtained via the Keras Network Learner node (Figure 4), which also includes the conversion to the appropriate encoding, as required by the first layer, and the selection of the loss function (binary cross-entropy).

Finally, the network application to new data is implemented with the generic DL Network Executor node, which takes any trained Keras or TensorFlow network and executes it on the new data.

Notice that the DL Network Executor node is not the only node able to execute a TensorFlow/Keras deep learning network. The DL Python Network Executor node can also do that. The difference between the two nodes is in the usage of GUI or script. Configuration settings in the first node are passed via GUI, while configuration in the second node is just a Python script. The first node is then easier to use, but, of course, less flexible. The second node requires Python code knowledge but can customize the network execution in more detail.

And here we are! We have assembled, trained and applied a four-layer neural network, including an LSTM layer, with just six Keras deep learning nodes.

Since LSTM units are capable of learning evolution over time — that is evolution over an ordered sequence of input vectors — this might turn out to be useful when learning negations or other language structures where word order in the sequence is important.

Figure 4. The first four gray metanodes in the workflow preprocess the texts to build same-length sequences of index encoded words. Truncation and zero padding are applied to fill up or cut sentences that are too short or too long, respectively. The result of such preprocessing and the input to the neural network can be seen in Figure 5, Keras-based deep learning network for sentiment analysis. The network consists of four layers: an input layer, an embedding layer, an LSTM layer, and a dense layer, as you can see from the top brown nodes. This network structure is then trained by the Keras Network Learner node and applied via the DL Network Executor node.

Figure 5. Input index coded sequences of words to the deep learning network.

Accuracy and Cohen’s kappa for this deep learning network have been evaluated on the test set respectively as 81 percent and 0.62 for the same partition used with the decision tree.

In this example, we worked on a small data set with only 50,000 reviews. On such a data set, the decision tree is already performing quite satisfactorily (71 percent accuracy), but the deep learning network adds a 10 percent performance increase (81 percent accuracy). In any case, we hope that it was useful to see the practical steps of the machine learning approach implementation with both algorithms. The deep learning workflow, with the entire IMDb dataset, is downloadable for free from the KNIME public EXAMPLES server under:

04_Analytics/14_Deep_Learning/02_Keras/08_Sentiment_Classification_with_Deep_Learning_KNIME_nodes

Conclusions

We have described two basic techniques for sentiment analysis.

The first one is NLP-based and requires a dictionary of sentiment-labeled words and a set of more or less complex rules to determine a sentence sentiment from the words and grammar in it. Rules might be complex to establish (negations, sarcasm, dependent sentences, etc.), but in the absence of a sentiment-labeled data set, sometimes this is the only viable option.

The second one is ML-based. Here we train an ML model to recognize sentiment in a sentence based on the words in it and a sentiment-labeled training set. This approach depends heavily on the ML algorithm used and the document numeric representation. One current trend is to use time sequence algorithms to recognize not just the words but also their respective order in the sentence. As an example of this particular approach, we showed a deep learning neural network including an LSTM layer.

All workflows used in this article are available for free on the KNIME EXAMPLES public server under:

08_Other_Analytics_Types/01_Text_Processing and 04_Analytics/14_Deep_Learning/02_Keras

(see video on how to access and download workflows form the KNIME EXAMPLES Server)

About the Authors

Rosaria Silipo, Ph.D., principal data scientist at KNIME, is the author of 13 technical publications, including her most recent book “Practicing Data Science: A Collection of Case Studies”. She will be conducting a free webinar, “Sentiment Analysis: Deep Learning, Machine Learning, Lexicon Based,” on November 27, 2018, at 9 a.m. Pacific time/12 p.m. Eastern time/6 p.m. Central European time.” She holds a doctorate degree in bio-engineering and has spent most of her professional life working on data science projects for companies in a broad range of fields, including IoT, customer intelligence, the financial industry, and cybersecurity. Follow Rosaria on Twitter, LinkedIn and the KNIME blog.

Kathrin Melcher is a data scientist at KNIME. She holds a master degree in mathematics from the University of Konstanz, Germany. She enjoys teaching and applying her knowledge to data science, machine learning and algorithms. Follow Kathrin on LinkedIn and the KNIME blog.

Uncategorized

Big Data Visualization: How AR and VR Is Transforming Data Interpretation

MMS • RSS

Article originally posted on Data Science Central. Visit Data Science Central

Every day, we create 2.5 quintillion bytes of data and over the last two years alone, we have created over 90% of the world’s data! The amount of data we produce is staggering making it difficult for the data analysts all over the world to analyze data and project the results of such analysis into an understandable form.

The Ever-Growing Big Data

Though there are a lot of data at hand, enterprises aren’t exploiting it to its full potential. In fact, according to Forrester, between 60% and 73% of all data within an enterprise goes unused for analytics. While this is the case of enterprises, this percentage will be even higher for small businesses indicating that there is a lot of unused data in the world.

The main hindrance to using such data is the ability to perceive the data as facts which makes the dependence of the efficiency on the visualization techniques. Pie charts and bar graphs are used to visualize these data into a readable form but there are certain limitations to it. This 2D representation of the big data analysis restricts our imagination and therefore, our conclusions from the data gathered.

With big data gaining momentum exponentially, it’s important to get a hold on the insights obtained from it. What is the point of having so much amount of data yet not able to pick a proper, coherent interpretation out of it?

Visualizing Big Data Through AR and VR

The broader the canvas of visualization is, the better the understanding is. That’s exactly what happens when one visualizes big data through the Augmented Reality (AR) and Virtual Reality (VR). A combination of AR and VR could open a world of possibilities to better utilize the data at hand. VR and AR can practically improve the way the data is being perceived and could actually be the solution to make use of the large unused data.

By presenting the data in the form of 3D, the user will be able to decipher the major takeaways from the data better and faster with easier understanding. Many recent researches have been carried out which shows the VR and AR has a high sensory impact which promotes faster learning and understanding.

This immersive way of representation of the data has enabled the analysts to handle the big data more efficiently. It makes the analysis and interpretation more of an experience and realization than the traditional analysis. Instead of the user seeing numbers and figures, the person will be able to see beyond it and into the facts, happenings and reasons which could revolutionize the businesses.

You can cross-reference data and figures across years through a lot of factors with much ease which isn’t possible when you are seeing the data scattered on a pie chart or a bar graph. As you are observing the data in 3D, you may want to see how it correlates with a past incident or a strategy. You can easily pull that data and see the correlation between them, compare and gain a whole lot of perspective from it. VR and AR can greatly help in the multi-dimensional analysis of your business problems and reach the solutions sooner.

Every data is trying to indicate something. When you can actually see the data swimming before your own eyes, it becomes simpler to understand what it says. You can stack the data, move it around in front of your eyes, arrange it as you like and do whatever you want to, like a physical set of beads.

Interact with Data – Literally!

As data was seen previously in computer screens, it seemed more theoretical and less interactive. But when you can actually around in the middle of your data, literally, wouldn’t you be able to comprehend it better?

With the data projected on X, Y and Z axes with hundreds or sometimes thousands of data points marked all around with various factors governing it, you can see it plainly about what these data collectively are conveying. Based on the size and color of data, you can differentiate between them, compare these data with some other data of the past or update it with new data.

You can actually point at the data, shuffle it, organize it, highlight it and do what you can like seeing it all physically. You could point your finger at a group of data and say, these are the reason why our last marketing campaign failed and it would totally make sense too!

When you start seeing the data physically, like some sort of an object, you can see clearly about what the data is trying to represent. Another advantage of VR-based data visualizations is that the user can focus more clearly on the data and its interpretation which will further improve the conclusions derived.

The visualization of big data using VR and AR can cut down the time one spends on data analysis drastically and what’s more is you can get better insights from the data comparatively. The future of big data visualization with AR and VR is the only way to not waste the mass amount of data generated and gathered and gain a better understanding of them.

Uncategorized

Microsoft Announces Container Support for Azure Cognitive Services

MMS • RSS

Article originally posted on InfoQ. Visit InfoQ

Microsoft has announced container support for Cognitive Services, which allows taking advantage of machine learning capabilities anywhere, whether it is in the cloud, on the edge or on-premises. With Azure Cognitive Services, organizations can start using various cognitive features, like vision, speech and text processing, without the need for a dedicated data scientist.

Support for containerization is accomplished by giving the option to deploy pre-built models as Docker containers, allowing these to run wherever Docker is available. The announcement of Microsoft closely follows Google’s launch of Kubeflow Pipelines, which provides support for machine learning through Kubernetes containers. Seeing that, it is logical these services focus on bringing the foundations for ML platforms, allowing companies and developers to instead concentrate on harnessing the actual value which ML and AI can bring.

Source: https://venturebeat.com/2018/11/24/before-you-launch-your-machine-learning-model-start-with-an-mvp/

As Azure Cognitive Services now can run on containers, the need for sending data for these models to Azure is no longer there, as this can now run in any cloud or on the edge. Subsequently, as data no longer must leave the on-premises environment, this also opens possibilities to process data which can not be used in Azure due to privacy or regulatory restrictions. Moreover, scenarios using massive data loads, which would either be too expensive or time-consuming to bring to the cloud, are now capable of processing on the edge as well, while taking advantage of the power of Cognitive Services and the scaling of Docker containers. These statements are confirmed by Eric Boyd, Corporate Vice President on Azure AI.

With container support, customers can use Azure’s intelligent Cognitive Services capabilities, wherever the data resides. This means customers can perform facial recognition, OCR, or text analytics operations without sending their content to the cloud. Their intelligent apps are portable and scale with greater consistency whether they run on the edge or in Azure.

Initially, five of Azure’s Cognitive Services are available through containers, with more expected to follow later.

To get started with Cognitive Services in containers, either sign up for the face and text recognition services or start immediately using one of the other services. The images are available from Microsoft Container Registry or Docker Hub, and after pulling them in they can be configured and used in a Docker environment.

Finally, it is important to note that, when using the containerized services, even though they send none of the processed data to Azure, a connection is still required on start-up and at regular intervals. Metrics for billing are sent over this connection, where the costs currently are the same both when using Azure or the container images.

Uncategorized

InfoQ's New Desktop Design: Public Beta and Video Tour

MMS • RSS

Article originally posted on InfoQ. Visit InfoQ

[embedded content]

We’ve made a number of changes to InfoQ’s desktop site, the third major overhaul of our design since we got started in 2006. The above video provides a quick overview of the new design and re-architected back-end features. You can switch to the new design now to try it out; this switcher just puts a cookie on your machine to tell us where to route you. Do please keep in mind that we are still in beta, and are making updates and changes based on your feedback. In some situations it may still be necessary to take the re-designed site down for a short period of time.

The new design is component-based and represents a common design language for InfoQ and QCon, so over the coming months you’ll see more convergence of the look and feel between InfoQ and the various QCon sites, as well as in areas like the newsletters we produce. The intention is that by using consistent design patterns we help build familiarity and understanding.

As part of this work we’ve simplified our typography, colours and iconography using Robot and Firs Sans as our two main fonts. We’ve worked hard to keep the new production distributable as small as possible – the common CSS is around 290KB, common JavaScript around 35KB, and the typical pages are 45-55KB each. Overall the new site is around 25% faster to load than previously.

The new design is intended to be fully inclusive and conforms to Web Content Accessibility Guidelines (WCAG). For protyping the design team used Sketch and InVision as the main design stack. From a technology stack point of view we’re using Vue.js with Nuxt.js and Pug.js for templating/scripting with Stylus & PostCSS for CSS processing, and Webpack for packaging. The new CSS follows SMACCS and BEM – a scalable and modular architecture for CSS with Block Element Modifiers. Behind the scenes, we are also overhauling InfoQ’s back end moving from an ageing JCR-based system to a new custom-built, microservice-like content repository running on AWS.

This update is just the beginning. The new design is fully responsive, and will be coming to mobile devices soon. We will also be continuing to work on improvements to content discovery, and making significant improvements to our video player, in the coming months.

When you switch over to the new design you’ll be able to see a feedback widget that you can use to provide comments. Alternatively feel free to leave long form comments below or email us at feedback@infoq.com.

Uncategorized

Google Introduces AI Hub and Kubeflow Pipelines for Easier ML Deployment

MMS • RSS

Article originally posted on InfoQ. Visit InfoQ

Google is launching two new tools, one proprietary and one open source: AI Hub and Kubeflow pipelines. Both are designed to assist data scientist design, launch and keep track of their machine learning algorithms.

With AI Hub and Kubeflow pipelines, Google is following up on their earlier release of Cloud AutoML in January and continues its strategy to simplify and speed up its customer’s ability to adapt to Google’s AI techniques and services. Hussein Mehanna, engineering director of the Cloud ML Platform, wrote in a blog post:

Our goal is to put AI in reach of all businesses. But doing that means lowering the barriers to entry. That’s why we build all our AI offerings with three ideas in mind: make them simple, so more enterprises can adopt them, make them useful to the widest range of organizations, and make them fast, so businesses can iterate and succeed more quickly.

Google introduces AI Hub to put AI more in reach of businesses making it easier for them to discover, share, and reuse existing tools and work. Moreover, the AI Hub is a one-stop destination for ML content, such as pipelines, Jupyter notebooks, and TensorFlow modules. The benefits, according to Mehanna are:

High-quality ML resources developed by Google Cloud AI, Google Research and other teams across Google are publicly available to all businesses.
Google provides a private, secure hub where enterprises can upload and share ML resources within their organizations. The hub makes it easy for businesses to reuse pipelines and deploy them to production in GCP—or on hybrid infrastructures using the Kubeflow Pipeline system—in just a few steps.

Next, to a central place where organizations can discover, share and reuse ML resources, they can also with Kubeflow Pipelines build and package their ML resources. Kubeflow Pipelines are an extension of Kubeflow, an open-source framework developed on top of Kubernetes explicitly designed for machine learning. These pipelines are essentially containerized building blocks that users can string together to build and manage machine learning workflows.

Source: https://cloud.google.com/blog/products/ai-machine-learning/introducing-ai-hub-and-kubeflow-pipelines-making-ai-simpler-faster-and-more-useful-for-businesses

Essentially Kubeflow Pipelines are an open-source workbench solution allowing users to compose, deploy and manage machine learning workflows. It offers interoperability and the flexibility to experiment with models before deploying them into production.

Furthermore, Mehanna wrote in the blog post:

Organisations like Cisco and NVIDIA are among the key contributors to this open source project and are collaborating with us closely to adopt Kubeflow pipelines. NVIDIA is already underway integrating RAPIDS, a new suite of open source data science libraries, into Kubeflow. The RAPIDS library leverages GPUs to provide an order of magnitude speed-up for data pre-processing and machine learning, thus perfectly complementing Kubeflow.

Both AI Hub and Kubeflow Pipelines are meant to help data scientists share models across their organization. Also, the early adopters of these tools will be organizations with a sizeable machine learning/data science function and according to Rajen Sheth, director of product management for Cloud AI expects them to come from retail, financial services, manufacturing, healthcare and media sectors.

Uncategorized

Data Analytics. AI. ML. What’s the Difference?

MMS • RSS

Article originally posted on Data Science Central. Visit Data Science Central

There are transformative technologies in the world today with consistent effect and reliability in their promise to alter or change the ecosystem. Industries have transformed, and early adopters with it, while others race to understand how best to adapt or integrate said emerging technologies into their organizations in an effective and seamless manner.

Among those, artificial intelligence is far from being a new concept. As a technology, it’s been with us for a while now, but things have changed. We look at cloud-based service options, the applicability of AI on several critical organizational functions, and the power of computing, among many others.

In fact, AI’s impact on several industries is predicted to grow quite rapidly and is expected to be in the high billions by 2025. AI or artificial intelligence is a buzzword, but organizations continue to struggle with their digital transformation to become data-driven. What’s the challenge, and how can it be solved?

The thing is, businesses are embedding AI solutions into their business portfolio, but face issues in the form of cost, privacy, security, integration, and even regulatory forms. But could analytics play a role in the acceleration of the onboarding of AI in enterprises. After all, enterprises that have deployed analytics are two times as likely to receive senior management buy-in for AI adoption.

While many believe AI to be part of a big digital revolution, analytics rank as a part of the evolution that could lead to successful AI implementation. For example, machine learning models are most effectively trained on huge datasets. Similarly, in an organization that is analytically aware, more specifically those that deal with data integration and preparation, data wrangling, and more, AI is a natural progression.

Artificial intelligence, in a way, is a straightforward transition for those organizations with a mature analytics system. Research even suggests that global technology leaders that are most successful with adopting AI-based technologies often incorporate a data strategy into their core business functionalities – APIs, interfaces, and more.

An enterprise-wide policy on data standards is one method to streamline analytics and the machine learning practice. Furthermore, maintaining said data policy could help identify stakeholders and in monitoring enterprise-wide access and strategy, resulting also in the reduction of employee confusion.

AI Matures Over Time with Analytics

Artificial intelligence and machine learning function towards maturity over a period depending on the data and quality of said data. This speaks to specific organizations’ investment in data warehouses or data storage, as a part of the process of aligning assets for AI implementation. After all, data quality is a direct measure for the quality of data predictions.

In time, we are likely to witness companies focus on solving the challenge of acquiring and maintaining accurate data for AI to live up to its promise of a data and business revolution. At the same time, it is also important to understand that penetration and maturity aren’t always associated with a positive correlation. For instance, even with the deepest analytics penetration of all sectors, e-Commerce is known to hold the lowest maturity.

Analytics to Pave Way for AI Adoption

In today’s era, organizations must possess a solid understanding of business intelligence (BI) stack, including capabilities for analytics storage, governance, and the ability to manage unstructured and structured data. These tools and techniques are the building blocks of an effective AI strategy. Let’s take a look at more ways in which analytics positively underpin an AI-based future:

An investment in big data analytics is critical to the success of combining unstructured and structured data that sits alongside legacy data sources such as ERP and CRM systems.
Investing in big data architecture or strategy strengthens the BI stack of technology from storage, ingestion, modelling, discovery, visualization, machine learning, and analytics.
Atop of this, organizations must venture in to explore tools required to enable data visualization and exploration by the end-users and the business itself.
Building an enterprise-wide business management system enables companies to create robust platforms for big data for more than just descriptive analytics. It could include reporting and implementation methodologies around machine learning, artificial intelligence, predictive and prescriptive analytics at scale.
An enterprise-wide BI platform could also accelerate AI adoption via algorithms, deployment of best practices, and solutions. In fact, an organization’s deep analytics expertise can help in leveraging AI and ML more effectively.

Organizations are now in an ecosystem that increasingly require decision-making that involves significant technology implications. But understanding the difference between AI, ML and analytics, and the existence of the latter in the augmentation of the former is important and key to business-critical success. In the end, it’s always been about choosing the right tools for the right job.

Jay Nair – Chief Operating Officer, Marlabs Inc.

As the COO, Jay has played an important role in accelerating the transformation of Marlabs into a digital services and solutions provider. He spearheaded the Digital360 initiative, which offers a complete suite of digital services across industries. Jay’s broad and varied business experience and skills helped Marlabs incubate NextGen technologies that provide outstanding business value. He also played an important role in transforming the company from a small group of 15 to more than 2,300 employees globally, growing into a $100 million company.

Uncategorized

How To Become A Successful R Programmer?

MMS • RSS

Article originally posted on Data Science Central. Visit Data Science Central

R is a software programming language developed in 1993. In New Zealand, two professors of Auckland University Ross Ihaka and Robert Gentleman first conceived R. The most stable beta version of R was made in 2000. Here ‘R’ holds an extensive catalog produced of statistics and graphics methods. These methods include a machine learning algorithm, time series, linear regression, statistical inferences and many more.

R programming is a tool used for statistical analysis and creates publication-quality data visualization. Today, R is one of the popular languages used by many industry giants. The following series of step-like do data analysis using R; program, transform, discover, model and communicate the results.

Those who are certified in R programming are R programmers. R language is a free platform can be used on any operating system. You can install R for free and use it without ever purchasing its license.

Why are R Programmers in a Great Demand for Employers?

Let’s now see why R programming is so important to learn. As technology is getting flooded with data eruption, it has taken the industries by storm. The future generation technologies paved their way for this highly digital world to make devices smarter with information.

It is not possible without the data science technologies. R programming is ranked top in the most preferred languages in the past few years. The reasons why programmers prefer this language over others are:

More job opportunities: With the constant hype in data science, more job opportunities come every day to provide a chance to data analysts to climb up their career to a next level. As R is a popular programming language to learn will help you to rise in your job.

Attractive salary: According to the survey conducted, R is ranked as the top paid skill.
Important for Data Science: There are three primary reasons; let’s understand:

1. - Run your code without using the compiler – R is an understood language. Therefore run Code without a compiler. R will interpret and develop the code quickly.
  - Calculation with vectors – A vector language is R so that any function can be added to the vector.
  - Statistical Language – R is a complete turning language where any task can be performed.

Trend: As per a famous ranking system, R is climbing the chart of popularity steadily since 2008. Hence R programmers are in high demand by the companies.

Why is R essential for Industries?

As said earlier, it is a free and open source that offer great visualization. Researchers say that there are far capabilities than other tools.
For a data-driven industry, R programming can be used as their platform and recruit trained R programmers.

Roles and Responsibilities

Full support and documentation: The online resources of R include well supported and documented message boards. Many R developers participate in this online discussion of the packages and tools designed. These packages help in creating Random-effects regression type of models.

More appealing to employers: R is an inherently valuable and useful skill. It is helpful for any industry that relies on data analysis. The price of learning statistical packages is very high at the enterprise level. Due to that employer hires people who know R and save thousands without purchasing statistic packages.

Analyze, acquire and clean data in one place: Using the R, you can do data acquisition, analysis and cleaning at one home.

Use data visualization tool: You can take advantage of the excellent data visualization tool of R.

Speed up: You can learn R easily and lots of communities, even companies that offer high-quality courses online.

The Latest Trend!

R has become an extensive favorite tool in the past few decades. This language is used for data analysis in the top companies in the world. Certain organizations are there who monitor regularly and publish reports of ongoing trends in this data science world. According to the survey of 2015, there is a 40% rise in the requirement of R analytic professionals.

R is a language that booms along with other languages like Python, Hadoop. R is used by about 2 million people now. In this programming career, there are artificial intelligence and machine learning. In business intelligence also involves advantages of the R. A proper amount of mathematical knowledge, programming experience and the ability of financial analysis are needed to pursue this path.

If you think of a career in data science, 2018 offers a plethora of profitable opportunities. A career in R Programming can lead you to become the following:

R programmer
Data Scientist
Data Analyst
Data Visualization Analyst
Data Analyst
Database Administrator

Who can become an R Programmer?

R programming is more suitable for those who have an interest in machine learning, statistical analysis, and data mining. It is not recommended for any general programmer who aspires to become a data scientist.

Skills Require

Programming Languages

A data scientist has to be excellent in any one of the languages like R, SAS, Python, Hadoop, etc. It is not only about writing code instead needed to be comfortable with using various programming environments for analyzing data. In the data science field, unprecedented value and interest in the business is given around the world.

Understanding the statistics

Probability, Hypothesis Testing, Inferential and Descriptive statistics are the necessary things to learn in data science. An intuitive understanding is needed to interpret the statistical output of a business.

Machine learning

Machines do the best job when it comes to categorizing and computing large unstructured data. This cannot be done alone by them, but they can identify the trends or patterns that are not clear to a data scientist. They have to be supervised, so you must have these skills to help computers learning from data to derive insights and bring a practical solution.

Visualization Skills

Knowledge in data visualization tools like QlikView, Tableau, Sisense or Plotly ensures that you are confident to present insight into technical as well as non-technical audiences to convince them for business values insight can be drawn.

Communication

R programmer must be the best communicator. They must work with lots of professionals and stakeholders to solve real-life problems. Also, they must understand the data and domain for which their works.

Recommended Courses for R Programming!

R programming is primarily a course that explains the concepts and brings the learner into facing real-life tasks and problems. Some of the R programming courses offered by top rated online sources like Simplilearn, Udemy, Coursera, etc. are:

Data Science with R programming by Simplilearn:

R programming by Johns Hopkins University

The R Programming Environment – How to Setup Environment to write and test your R-Programming Code

R Programming A-Z: R For Data Science With Real Exercise

Data Science and Machine Learning Boot camp with R

To become an R programmer, good command over the programming languages and your ability to adopt the changes in technology is the key to success. An R programmer is beneficial for today’s world due to its growing importance in different industries.

The Story behind Very Fast iPhone XS JavaScript Performance

MMS • RSS

Subscribe for MMS Newsletter

Did you know...

The Two (Conflicting) Definitions of AI

MMS • RSS

Subscribe for MMS Newsletter

Did you know...

Feedback for Skill Development and Careers

MMS • RSS

Subscribe for MMS Newsletter

Did you know...

Article: Sentiment Analysis: What's with the Tone?

MMS • RSS

Key Takeaways

Sentiment Analysis

What You Need

The Workflows

Deployment

Word Sequences and Deep Learning

Conclusions

About the Authors

Subscribe for MMS Newsletter

Did you know...

Big Data Visualization: How AR and VR Is Transforming Data Interpretation

MMS • RSS

Subscribe for MMS Newsletter

Did you know...

Microsoft Announces Container Support for Azure Cognitive Services

MMS • RSS

Subscribe for MMS Newsletter

Did you know...

InfoQ's New Desktop Design: Public Beta and Video Tour

MMS • RSS

Subscribe for MMS Newsletter

Did you know...

Google Introduces AI Hub and Kubeflow Pipelines for Easier ML Deployment

MMS • RSS

Subscribe for MMS Newsletter

Did you know...

Data Analytics. AI. ML. What’s the Difference?

MMS • RSS

Subscribe for MMS Newsletter

Did you know...

How To Become A Successful R Programmer?

MMS • RSS

Subscribe for MMS Newsletter

Did you know...