×

New Book: Enterprise AI – An Applications Perspective

MMS Founder
MMS RSS

Now published. Enterprise AI: An applications perspective takes a use case driven approach to understand the deployment of AI in the Enterprise. Designed for strategists and developers, the book provides a practical and straightforward roadmap based on application use cases for AI in Enterprises. The authors (Ajit Jaokar and Cheuk Ting Ho) are data scientists and AI researchers who have deployed AI applications for Enterprise domains. The book is used as a reference for Ajit and Cheuk’s new course on Implementing Enterprise AI.

Download the book (members only) 

Click here to get the book. For Data Science Central members only. If you have any issues accessing the book please contact us at info@datasciencecentral.com.

Contents

Introduction 

  • Machine Learning, Deep Learning and AI 
  • The Data Science Process 
  • Categories of Machine Learning algorithms 
  • How to learn rules from Data? 
  • An introduction to Deep Learning 
  • What problem does Deep Learning address? 
  • How Deep Learning Drives Artificial Intelligence 

Deep Learning and neural networks

  • Perceptrons – an artificial neuron 
  • MLP – How do Neural networks make a prediction? 
  • Spatial considerations – Convolutional Neural Networks 
  • Temporal considerations – RNN/LSTM 
  • The significance of Deep Learning 
  • Deep Learning provides better representation for understanding the world 
  • Deep Learning a Universal Function Approximator 

What functionality does AI enable for the Enterprise? 

  • Technical capabilities of AI
  • Functionality enabled by AI in the Enterprise value chain 

Enterprise AI applications 

  • Creating a business case for Enterprise AI 
  • Four Quadrants of the Enterprise AI business case 
  • Types of AI problems 

Enterprise AI – Deployment considerations 

  • A methodology to deploy AI applications in the Enterprise 
  • DevOps and the CI/CD philosophy

Conclusions 

DSC Resources

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.

Clustering – Algorithms for Partitioning and Assignments

MMS Founder
MMS RSS

K-means algorithm is a popular and efficient approach for clustering and classification of data. My first introduction to K-means algorithm was when I was conducting research on image compression. In this applications, the purpose of clustering was to provide the ability to represent a group of objects or vectors by only one object/vector with an acceptable loss of information. More specifically, a clustering process in which the centroid of the cluster was optimum for the cluster and the clusters were optimum with respect to the centroids.

The dimensionality of the vectors ranged from 4 to 70 and even higher in which each cluster of N-dimensional vectors was to be represented by a single vector while minimizing a certain fidelity criterion or loss of information. The design process consisted of using a training set of vectors and to apply the results for outside the training set vectors.

Here, naturally, a couple of questions arises. First, what kind of cost function, fidelity criterion or distortion measure should be used to represent the penalty that is to be paid by our clustering process and its representation by only a single vector.

Second, how many clusters should there be and how to assign the samples to each cluster. The number of clusters to be chosen is a difficult task. Of course, if you have as many clusters as there are samples or vectors, then you have achieved this minimization. But, then, that is not considered clustering of the data. In any practical application, you have to resort to representing all the samples/vectors with a few samples/vectors and therefore create a set of cluster of vectors.

Let’s discuss each of these questions.

Which Cost function, Distortion Measure or Fidelity Criterion?

The answer to the question would depend on your application. A widely used measure is the mean square error also referred to as L2-norm. The L2-norm is commonly used most applications including signal processing, image compression, and video compression. There are applications that use absolute error difference, the L1-norm, also known as Manhattan distance. There are many other error measures such as the one used for speech that uses a weighted distortion measure called the Itakura-Saito measure. The distortion measure indicates the penalty to be paid for representing the members of the clusters by a single vector. As a result, the centroid of the cluster, that represents the cluster itself, is the generalized center of gravity of the cluster. The center of gravity of the cluster refers to the mean value of all the vectors in the cluster when mean-square-error is used as the distortion measure. To generalize this to other distortion measures, we use the term generalized-center-of-gravity to represent the centroid of the cluster for other distortion measures.

The second question relates to how many clusters and how to assign samples to clusters? The answer to the second part of the question is easy since the assignment of vectors to clusters follows directly from the cost function used in your system. The number of clusters, however, is an important one and would be discussed later.

Now, how do we decide on the clusters and their representation? We may answer this question by using an optimization by iteration technique.

More specifically, in this approach, we continuously optimize

a) the clusters for the given centroids, and

b) the centroids for the clusters,

until an optimum partitioning of clusters and centroid assignment is reached.

We can represent this approach by the following algorithm.

Iterative Algorithm:

  1. Initialization: Given the number of centroids N and their values A(m), a distortion threshold, e>0, and the points for all clusters, set iteration number m=0 and distortion value

D(-1)=Infinity.

  1. Given the set of centroids, find the minimum distortion partition S(m) for the centroid set A(m) and compute the resulting distortion, D(m) ,based on distortion criterion.

  2. If (D(m-1)-D(m))/D(m) < e, halt with the A(m) and S(m) representing the final centroids and partitions. Otherwise continue.

  1. Replace m by m+1 and find the optimum centroids A(m) for the partition S(m).

  2. Given the current partitioning of clusters, find the optimum centroid of the clusters and go to step 1.

The value of the threshold, e, depends on designer’s choice, and 0.05 may be considered as a commonly used value. The resulting partitioning and centroid would provide at least a local minimum, if not a global optimum.

Given the above algorithm, we will be able to find the best partitioning and the best representation for the partition. The important part that is missing in the above algorithm are the initial centroids and initial clusters, without which we cannot execute the algorithm. In this case, we let the cluster samples/vectors guide us to the partitioning and centroid selection using the following algorithm which is known as splitting algorithm to obtain N partitions.

Splitting Algorithm:

  1. Initialization: Set the number of centroids M=1, and define A(1) to be the centroid of the entire available points/vectors.

  2. Given A(M) containing M points, perturb each a(M) point to (1+d)a(M), where d is a small number. Given the resulting 2M vectors replace M by 2M.

  3. Run “Iterative Algorithm” for M points to obtain new A(M).

  4. Is M=N? If so, halt with the result of the previous step as the final clustering and its representation. Otherwise, go to step 1.

The preceding algorithm creates the clustering along with the optimum centroid for each cluster. You observed that the size of the clusters using the above algorithm will be a power of 2 due to the splitting process. Of course, we can select any size for the resulting number of centroids by changing the splitting process in step 1 to perturb only one the centroids at a time, possibly the one that exhibits the maximum distortion. Using this approach the number of clusters is increased by one unit in each iteration until the desired number of clusters is achieved.

As discussed before, the final number of clusters is hard to determine. The user needs to decide how many clusters would suit the best for the application in consideration. As an alternative, we can consider growing the number of clusters until the desired threshold on total distortion for the clusters is reached.

Summary:

We discussed two algorithms that provide an optimal partitioning of clusters and assignment of centroids for each cluster based on some predefined distortion measure.

I hope this discussion was helpful. Best Regards, Faramarz Azadegan, Ph.D., faramarz_azadegan@yahoo.com

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.

Cloudera and Hortonworks Merge with Goal to Increase Competition with Cloud Offerings

MMS Founder
MMS RSS

Earlier this month, Cloudera and Hortonworks announced an all-stock merger at a combined value of around $5.2 billion. Analysts have argued that this merger is aimed at increased competition that both companies are facing from cloud vendors like Amazon, Google and Microsoft.

IBM is supporting both Hadoop MapReduce and Spark in their platform, BigInsights. Some analysts have pointed out that this convergence of Hadoop with Spark/Flink and other streaming technologies is already happening in the cloud platforms, leaving Cloudera and Hortonworks base offerings behind. Both companies and maybe even more so Hortonworks have placed their bets on partnerships with cloud vendors to provide turnkey solutions in the cloud. Microsoft’s HDInsight is based on Hortonworks Hadoop platform and Hortonworks is also partnering with RedHat and IBM on the Open Hybrid Architecture initiative. On the merger, Hortonworks CTO reassured existing customers of both companies that the new entity will “continue supporting the technology for a period of time”. According to Hortonworks CTO, part of the reasoning behind the merge is developing a unified, containerised architecture with a “frictionless hybrid” approach to big data management, allowing them to be seamlessly stored on-premises and in cloud providers. Startups like Qubole are offering cross-cloud platform self service big data management solutions. Cloudera, Hortonworks merger could also enable them to offer services in this space.

The CTO of Syncsort pointed out that while Hortonworks offerings are focusing on IoT and streaming data use cases, Cloudera focuses on Data Science, Machine Learning and Artificial Intelligence. In her opinion, this can make the merger a success as it could result in advancing the combined company far further and faster than either of them would do on its own. CEO and chairman of MapR Technologies, a direct competitor of both Hortonworks and Cloudera on the other side couldn’t find any innovation benefits to the merger and said that the merger is all about rationalization and cost cutting.

Aside from the main product offerings, there are questions about the future of related products in the Hadoop ecosystem. Cloudera has been backing Sentry, Impala and its own Cloudera Manager product. On the other hand Hortonworks offering was based on another set of open source products like Hive, Ambari, Atlas, NiFi and Ranger. As there is an overlap between competing products it still remains to be seen which product lines will fall from favour in the new combined company. Both companies CEO’s stated that the merger should be viewed as a combination of equals and the combined company will be 60% owned by Cloudera and 40% owned by Hortonworks.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.

Article: Getting Started With Vue.js

MMS Founder
MMS RSS

Key Takeaways

  • Learn how to get started with Vue.js by creating a simple image search application
  • Understand the benefits of Vue.js
  • Leverage services and tools including Vue.js, Vue CLI, Bulma, Giphy Search, Axios, Cloudinary, and AWS to build a fullstack application

TL;DR

In this rather long tutorial, you’ll learn how to use Vue CLI to build a Vue.js 2 application for searching and displaying Giphy images, and then manipulating/transforming them by using the Cloudinary API. Also, you’ll learn how to deploy this application to AWS.

__ We’ll be using Vue.js 2, which will be referred to as Vue.js in the rest of this post.

Introduction

Due to Vue’s progressive and flexible nature, it’s ideally suited for teams that have to rewrite old codebases one step at a time.

If you want to check out some detailed comparisons between popular frontend frameworks, here are three great posts:

I tend to answer general ‘framework war’ kind of questions in the following way:

  • Stop with the ‘analysis paralysis’ discussion.
  • Do your research, pick a framework—any framework—that the community is using, use it, and see how far it gets you.
  • It isn’t worthwhile to discuss X being slow or Y being better until you try it for your specific use case and preference.

In this post, we’ll use Vue.js and Cloudinary to apply some image manipulation techniques to the images that we’ll fetch using the Giphy API in the demo application that we’ll build. In the end, we’ll use AWS to host the application. This is a hands-on from start to finish tutorial with all steps outlined.

Demo app

You can fork the complete source code on GitHub, or you can see the finished app in action.

Prerequisites

Make sure that you have the following tools installed:

Vue CLI

As detailed in the Vue CLI README:

Vue CLI aims to be the standard tooling baseline for the Vue ecosystem. It ensures the various build tools work smoothly together with sensible defaults so you can focus on writing your app instead of spending days wrangling with configurations. At the same time, it still offers the flexibility to tweak the config of each tool without the need for ejecting.

To install vue-cli, run:

npm install -g @vue/cli

To confirm that the installation ran successfully, run:

vue --help

__ For reference, the version of the CLI (vue --version) used in this tutorial is 3.0.0-rc.3.

Starting a new app with Vue CLI

We’ll call our app image-search-manipulator. Let’s start a new app using vue-cli:

vue create image-search-manipulator

After this command finishes, let’s cd into the project and run it:

cd image-search-manipulator
npm run dev

 

You should get a console log similar to:

DONE  Compiled successfully in 7859ms

Your application is running here: http://localhost:8080
 

You should see the following page in your browser if you visit this link: http://localhost:8080.

Folder structure

Now, let’s open this project in the editor of your choice (I’m using Visual Studio Code), and you should see something like this:

Here we’ll only focus on the src folder. The contents of that folder should be something like this:

Adding content

If you search for the string Welcome to Your Vue.js App, you’ll see the string is within the HelloWorld.vue file. A *.vue file is a custom file format that uses HTML-like syntax to describe a Vue component. This file contains the following (… is used for brevity in the listing below):

<template>
  <div class="hello">
    <h1>{{ msg }}</h1>
    <h2>Essential Links</h2>
    <ul>
      ...
    </ul>
  </div>
</template>

<script>
export default {
  name: 'HelloWorld',
  data () {
    return {
      msg: 'Welcome to Your Vue.js App'
    }
  }
}
</script>

<!-- Add "scoped" attribute to limit CSS to this component only -->
<style scoped>
h1, h2 {
  font-weight: normal;
}
ul {
  list-style-type: none;
  padding: 0;
}
li {
  display: inline-block;
  margin: 0 10px;
}
a {
  color: #42b983;
}
</style>

Without knowing anything about Vue.js, we can see where we would change the Welcome to Your Vue.js App text. So, let’s change that to Welcome to Image Search Manipulator. While you do that, also remove the contents of the style tag, as we don’t need it here.

Adding input and a button

Our basic application should have one input field and one button.

To do this, add the following code to the HelloWorld.vue file in the template:

<input name="search">

<button>Search</button>
 

The template element of HelloWorld.vue should look like this now:

<template>
  <div class="hello">
    <h1>{{ msg }}</h1>
    
    <input name="search">

    <button>Search</button>
  </div>
</template>

Actions

Having a simple search input field and a button does not do much on their own. We want to click the button, and we want to output something to the console just to verify that the button is working correctly.

This is how we define a function that will handle the button click in Vue:

<button @click="performSearch">Search</button>

Vue applications often contain an equivalent alternative shorthand syntax:

<button v-on:click="performSearch">Search</button>

Within the browser’s developer tools, you’ll observe an error such as:

webpack-internal:///./node_modules/vue/dist/vue.esm.js:592 [Vue warn]: Property or method "performSearch" is not defined on the instance but referenced during render. Make sure that this property is reactive, either in the data option or for class-based components, by initializing the property. See https://vuejs.org/v2/guide/reactivity.html#Declaring-Reactive-Properties.

found in

---> <HelloWorld> at src/components/HelloWorld.vue
       <App> at src/App.vue
         <Root>

This error occurs because we haven’t defined the performSearch function. In the HelloWorld.vue file, add the following function definition inside the script tag, methods object property:

<script>
export default {
  name: "HelloWorld",
  data() {
    return {
      msg: "Welcome to Image Search Manipulator"
    };
  },
  methods: {
    performSearch: function() {
      console.log("clicked");
    }
  }
};
</script>

We’ve now defined the performSearch function, which doesn’t accept any parameters and has no return value.

Taking input

To print the string that was typed in the input field to the console, we first need to add a new attribute to the input field:

<input name="search" v-model="searchTerm">

The v-model instructs Vue.js to bind the input to the new searchTerm variable, such that whenever the input text updates, the value of searchTerm also gets updated.You can learn more about v-model and other form input bindings in the Vue documentation.

Finally, change the performSearch function in order to log the value to the console:

performSearch: function() {
    console.log(this.searchTerm);
}

Giphy search API

With our example application, we now want to connect our search fields to an API call to return images. Giphy provides a search API. We need to determine the request parameters needed to search Giphy’s database. If we open the search API link, we determine the format of the service: 

In the next section, we’ll cover retrieving this data from within our app.

Vue.js HTTP requests

There are several ways to send HTTP requests in Vue.js. For additional options, check out this post about making AJAX calls in Vue.js.

In this tutorial, we’ll use Axios, a popular JavaScript library for making HTTP requests. It’s an HTTP client that makes use of the modern Promises API by default (instead of JavaScript callbacks) and runs on both the client and the server (Node.js). One feature that it has over the native .fetch() function is that it performs automatic transforms of JSON data.

In your Terminal/Command prompt enter the following command to install Axios via npm:

npm install axios --save

Import Axios into the HelloWorld.vue file just after the opening script tag:

import axios from "axios";

The performSearch function should now look like this:

const link = "http://api.giphy.com/v1/gifs/search?api_key=dc6zaTOxFJmzC&q=";
const apiLink = link + this.searchTerm;

axios
  .get(apiLink)
    .then(response => {
      console.log(response);
    })
    .catch(error => {
        console.log(error);
    });

For reference, the contents of the HelloWorld.vue file should now be:

<template>
  <div class="hello">
    <h1>{{ msg }}</h1>
    
    <input name="search" v-model="searchTerm">

    <button @click="performSearch">Search</button>
  </div>
</template>

<script>
import axios from "axios";

export default {
  name: "HelloWorld",
  data() {
    return {
      msg: "Welcome to Image Search Manipulator"
    };
  },
  methods: {
    performSearch: function() {
      const link = "http://api.giphy.com/v1/gifs/search?api_key=dc6zaTOxFJmzC&q=";
      var apiLink = link + this.searchTerm;

      axios
        .get(apiLink)
        .then(response => {
          console.log(response);
        })
        .catch(error => {
          console.log(error);
        });
    }
  }
};
</script>

__ When you’re testing, you may also want to limit the number of images that you get from Giphy. To do that, pass limit=5 in the link like this: http://api.giphy.com/v1/gifs/search?api_key=dc6zaTOxFJmzC&limit=5&q=

Now, if you run the app, enter something in the search box, and click the search button, you’ll see something like this in your console log:

The response object is returned, and in its data property there are 25 objects, which hold information about the images that we want to show in our app.

To show an image, we need to drill down on the object images, then fixed_height_still, and finally on the url property.

Also, we don’t want to just show one image but all of the images. We’ll use the v-for directive for that:

<img v-for="i in images" :key="i.id" :src="i.images.fixed_height_still.url">

The v-for directive is used to render a list of items based on an array and it requires a special syntax in the form of item in items, where items is the source data array and item is an alias for the array element being iterated on.

For reference, here’s the full listing of the HelloWorld.vue file:

<template>
  <div class="hello">
    <h1>{{ msg }}</h1>
    
    <input name="search" v-model="searchTerm">

    <button @click="performSearch">Search</button>

    <div>
        <img v-for="i in images" :key="i.id" :src="i.images.fixed_height_still.url">
    </div>
  </div>
</template>

<script>
import axios from "axios";

export default {
  name: "HelloWorld",
  data() {
    return {
      msg: "Welcome to Image Search Manipulator",
      searchTerm: "dogs",
      images: []
    };
  },
  methods: {
    performSearch: function() {
      const link =
        "http://api.giphy.com/v1/gifs/search?api_key=dc6zaTOxFJmzC&limit=5&q=";
      var apiLink = link + this.searchTerm;

      axios
        .get(apiLink)
        .then(response => {
          this.images = response.data.data;
        })
        .catch(error => {
          console.log(error);
        });
    }
  }
};
</script>

At this point, if we take a look at the app and search for ‘coding’, we’ll get this:

Making the app look pretty with Bulma

Even though our example application functions as expected, the results do not look great.

Bulma is an open source CSS framework based on Flexbox. Bulma is similar to Bootstrap with fewer interdependencies, and is just pure CSS without adding JavaScript.

First, let’s install Bulma:

yarn add bulma if you’re using Yarn, or npm install bulma if you’re using NPM.

Add bulma to the App.vue file, just after the import HelloWorld from “./components/HelloWorld”; line, like this:

import "../node_modules/bulma/css/bulma.css";
 

Here’s the template from the HelloWorld.vue file after adding Bulma classes to it:

<template>
    <div class="container">
        <h1 class="title">{{ msg }}</h1>
    
        <input name="search" class="input" v-model="searchTerm">
        <br><br>
        <button @click="performSearch" class="button is-primary is-large">Search</button>

        <div class="myContent">
            <img v-for="i in images" :key="i.id" :src="i.images.fixed_height_still.url">
        </div>
    </div>
</template>

Here’s a recap of what we did:

  • Added the class container to the first div tag
  • Added the class title to the h1 tag
  • Added the class input to the input tag
  • Added the following classes to the button: button is-primary is-large

For demonstration purposes, the myContent class was added like this:

.myContent {
  padding-top: 20px;
}

And a bit of padding was added around the img tags:

img {
    padding: 5px;
}

The following classes were added to the HelloWorld.vue file like this:

<style>
.myContent {
  padding-top: 20px;
}

img {
  padding: 5px;
}
</style>

With these few quick changes, we now have a better-looking app:

Bulma has many options without the overhead typical of larger CSS frameworks.

Image manipulation

We’ll now use a few techniques to manipulate the images.

Cloudinary is a service offering many image manipulation options.

After you create a Cloudinary account, you need to install Cloudinary:

npm install cloudinary-core --save

For reference, here’s the full source code listing and a summary of changes:

  • Added new div <div class="myContent" v-html="manipulatedImages"></div>. Because the manipulatedImages variable will contain HTML, we need to use the v-html directive to show it in the template as such, and not as a string. You can learn more about the v-html directive in the Vue documentation.
  • Imported Cloudinary import cloudinary from "cloudinary-core";
  • Called Cloudinary on each image returned from the Giphy API (starts with const cloudinaryImage = cl):
<template>
    <div class="container">
        <h1 class="title">{{ msg }}</h1>
    
        <input name="search" class="input" v-model="searchTerm">
        <br><br>
        <button @click="performSearch" class="button is-primary is-large">Search</button>

        <div class="myContent">
            <img v-for="i in images" :key="i.id" :src="i.images.fixed_height_still.url">
        </div>

        <div class="myContent" v-html="manipulatedImages"></div>
    </div>
</template>

<script>
import axios from "axios";
import cloudinary from "cloudinary-core";

export default {
  name: "HelloWorld",
  data() {
    return {
      msg: "Welcome to Image Search Manipulator",
      searchTerm: "dogs",
      images: [],
      manipulatedImages: ""
    };
  },
  methods: {
    performSearch: function() {
      var link =
        "https://api.giphy.com/v1/gifs/search?api_key=dc6zaTOxFJmzC&limit=5&q=";
      const apiLink = link + this.searchTerm;

      const cl = new cloudinary.Cloudinary({
        cloud_name: "nikola"
      });

      axios
        .get(apiLink)
        .then(response => {
          this.images = response.data.data;

          this.images.forEach(image => {
            const cloudinaryImage = cl
              .imageTag(image.images.fixed_height_still.url, {
                width: 150,
                height: 150
              })
              .toHtml();

            this.manipulatedImages += cloudinaryImage;
          });
        })
        .catch(error => {
          console.log(error);
        });
    }
  }
};
</script>

<style>
.myContent {
  padding-top: 20px;
}

img {
  padding: 5px;
}
</style>

Deploying to AWS S3

We’re going to complete this tutorial by publishing our app on Amazon S3 (Amazon Simple Storage Solution).

First, login (or create an AWS account) to the AWS Console and search for S3:

Familiarize yourself with the interface and create a new AWS S3 bucket. Buckets are places where we may add files. Make sure that, when you create the bucket, you select the Grant public read access to this bucket setting as bucket contents are private by default:

Finally, go into your S3 bucket and enable static website hosting (enter index.html as suggested in the Index document field):

Build it

Run npm run build to build a production version of your app. All production-ready files are going to be placed in the dist folder.

If you get a blank page after you upload the contents of the dist folder to the AWS bucket and visit the website, then check out the solution which basically states that you need to remove the / from the assetsPublicPath in config/index.js:

If you encounter the blank page issue, perform the step above and repeat the npm run build command.

Upload

Upload the entire content of the dist directory to the S3 bucket. This can be done using a simple drag and drop. Make sure you set permissions for the files so that they are globally read accessible:

That’s all there is to it! 

You may view my version of the app created in this tutorial.

Backup

An important tip to finish up the tutorial is to backup your work. Several options are available for backing up apps on AWS, including the AWS native solution, and other, third-party solutions that offer added features such as zero downtime. These might not be required for your basic app, but are worth looking into for larger-scale use cases.  

It all depends on your stack—but there are solutions for backing on any stack. For example, on the Azure stack, you could use Azure Backup.

Conclusion

In this tutorial, we learned how to get started with using Vue.js by building an application for searching images via Giphy’s API. We prettified our app with the Bulma CSS framework. Then we transformed the images using Cloudinary. Finally, we deployed our app to AWS

Please leave any comments and feedback in the discussion section below and thank you for reading!

About the Author

Gilad David Maayan is a technology writer who has worked with over 150 technology companies including SAP, Oracle, Zend, CheckPoint and Ixia, producing technical and thought leadership content that elucidates technical solutions for developers and IT leadership.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.

WorkerDOM Adds DOM Concurrency for JavaScript Programming

MMS Founder
MMS RSS

The big news at this year’s JSConf was the introduction of WorkerDOM, a JavaScript library to make the DOM available to Web Workers, allowing developers to leverage multi-core processor architectures to improve web performance.

Web Workers have been available for many years, but the missing DOM access has limited their adoption. The WorkerDOM project hopes this project will help renew interest in multi-threaded programming on the web and create better experiences for users.

The need for leveraging alternative processes to accelerate the user experience is an increasingly popular topic. At this year’s FullStack, software engineer James Milner presented on techniques for improving user experience with Web Workers by passing processing work to a Web Worker to improve the performance of the main application thread.

Another goal of WorkerDOM is to make web performance competitive with native platforms by unlocking performance gains wherever possible to provide better experiences, especially on mobile devices where single processor speeds have not improved as quickly as the number of processor cores.

To achieve a full representation of the DOM inside of Web Workers, WorkerDOM provides an efficient transport mechanism authored in TypeScript. Malte Ubl, tech lead for the AMP Project at Google, explains via the WorkerDOM announcement that the:

WorkerDOM hydrates server rendered DOM and then proxies mutations as an application makes changes to the page, such as reacting to user actions or running animations.

WorkerDOM gets installed via npm or yarn:

npm install @ampproject/worker-dom

yarn add @ampproject/worker-dom

Or it may be included as an ES module for browsers that provide native module support (everything modern except IE11 and Samsung Internet):

<script src="path/to/workerdom/dist/index.mjs" type="module"></script>
<script src="path/to/workerdom/dist/index.js" nomodule defer></script>

Detailed usage instructions are also available in the WorkerDOM README.

The full slides from the JSConf presentation WorkerDOM: JavaScript Concurrency and the DOM are also available:

WorkerDOM is currently in an alpha state, ready for experimentation and contributions. WorkerDOM also aims to provide compatibility with JavaScript frameworks, with initial support available for React, Preact, and Svelte.The project appreciates collaboration opportunities with framework and tool authors to provide an optimal experience for developers and users.

WorkerDOM is available under the Apache 2 open source license. Contributions are welcome via the WorkerDOM GitHub project and should follow WorkerDOM project’s contribution guidelines and code of conduct.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.

An Overview Of The Steps Required For A Successful System Integration

MMS Founder
MMS RSS

Systems integration is an increasingly utilized process that companies are realising the value of within their business. The process involves taking disparate systems are making them all work together as a whole.

Take, for instance, if your company has a sales arm. All the sales contact data will lie within a CRM, let’s say Salesforce – since it’s the major player in the CRM game. What if you wanted to grab all your active customers and auto-enrol them in your brand new loyalty program software? How would you do that? A data dump? Excel files? Manually copying and pasting?

Systems interoperability is going mainstream

We are seeing a big upswing in systems interoperability from a commercial software standpoint. If you want to build a product that integrates with Salesforce, you can do so via their open API, creating coding blocks to link up the two. If you already have your own in-house software, you can either build (code) your “calls” directly into your software, or develop an API for it to talk to Salesforce and other software products you may like to integrate with.

Apps like Zapier even do the heavy lifting for you; they’ve built their own code snippets that allow you send, sync, and transform data between commercial software products. For instance, you can grab every new lead entered into Salesforce and send a message to Slack with the lead information.

The secret to successful system integration…

Is a solid plan! So let’s check out the steps involved to make sure that everything goes smoothly when you’re building your systems lego masterpiece.

Step 1: Scope current software interfaces

To be able to successfully integrate disparate software products, you need to know what is available from a data flow perspective. For some products, there will be an API or data file outputs, which might be the only means of connecting up other systems. For software that you’ve built in-house, or open source software, you can alter the code to progress data flows as you please.

Step 2: Requirements gathering

What are all the tasks you want to achieve by linking up these systems? Can you think of other tasks that “might be nice to have” later down the track? Keep a list of the systems that you wish to link up, as well as which tasks you’d like to operate between which systems, the direction of flow of data, and whether triggers or synching will be used. Together with this list, and your current software interfaces, your systems integrator will be able to let you know which parts are achievable – and which may be hard work or impossible.

Step 3. Outline the system integration architecture

This part will involve documentation of everything required to integrate the two or more systems together. It may involve documents like data flow diagrams, software version requirements, hardware requirements, and more. Here it should note the data type inputs, whether the data itself needs to be transformed, and the data type outputs, to connect systems.

Step 4. System integration development

This step of the process involves actually building out what is necessary to perform the tasks required for systems integration. This may be updating versions, purchasing and hooking up new hardware, and coding up data flows between systems. Testing should be an integral part of this process, although not on live systems! End-to-end testing, as it becomes available, needs to be conducted.

Step 5. Plan for roll out

For critical systems, it’s not wise to have go-live across your systems all at once. You may try go-live across groups, or during quieter times, to ensure that you don’t cause yourself too many headaches at once. There will likely be problems when you release on your actual systems  – which is why you do it this way.

Step 6. Go-live

Once all the bugs are ironed out, testing is complete, and you’ve trialled the systems on a smaller scale, verifying they work as planned, then you can actually go-live with your entire, integrated system.

Step 7. Maintenance and support

Always plan to have ongoing maintenance and support after your go-live date. Even though you’ve trialled the systems with a smaller sample size, you will likely require support and fixes from your systems integrator for the first little while after your go-live date – and this may be over an extended period on larger or more complex projects.

What have we missed?

If you are conducting systems integration, then it’s important to pick a knowledgeable partner to help you build reliable systems on time that work as expected and provide extra support when you need it most!

When choosing a systems integration partner, look to those who have completed similar implementations for other companies, with a good reputation. You can generally tell who will end up being a good partner, as they come across knowledgeable, helpful, practical, and with a pocketful of suggestions that can aid in your systems integration journey.

If you would like to see how your systems can better communicate, either with commercial products you have purchased along the way (or via subscription services), or with systems you have developed for in house use, then ask us to step in with the roadmap.

CodeFirst takes a structured approach to systems integration to ensure the process runs smoothly, to dependable timeframes, and within reasonable budgets. You can depend on us to deliver bespoke solutions that help your business become more efficient – like a well oiled machine.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.

Introducing EmoPy: An Open Source Toolkit for Facial Expression Recognition

MMS Founder
MMS RSS

In a recent ThoughtWorks blog post, Angelica Perez shared information about a new open source project for an interactive film experience. The project is called EmoPy and focuses on Facial Expression Recognition (FER) by providing a toolkit that allows developers to accurately predict emotions based upon images passed to the service.

Perez defines FER as “an image classification problem located within the wider field of computer vision.” Computer vision is a hot topic, garnering investment from many large cloud providers to democratize access to these machine learning models through public APIs. The challenge though is the models and algorithms behind these services are not made publicly available and accessing high quality datasets is difficult. Pereze explains how EmoPy is different:

Our aim is to widen public access to this crucial emerging technology, one for which the development usually takes place behind commercially closed doors. We welcome raised issues and contributions from the open source development community and hope you find EmoPy useful for your projects.

Having access to a FER training model is very important and a standard set of emotion classifications are often used, including:

  1. Anger
  2. Disgust
  3. Fear
  4. Happiness
  5. Sadness
  6. Surprise
  7. Neutral

Image source: https://www.thoughtworks.com/insights/blog/recognizing-human-facial-expressions-machine-learning

The EmoPY toolkit was created as part of a ThoughtWorks Arts program which incubates artists working on social and technology projects. The ThoughtWorks team supported artist-in-residence Karen Palmer to create an interactive film experience called RIOT.

RIOT places viewers in front of a screen where a contentious video is shown to them. These video clips are based upon riot situations which include looters and riot police. Viewer’s facial expressions are recorded and analyzed using a webcam, which is loaded into EmoPy.

Image source: https://www.thoughtworks.com/insights/blog/emopy-machine-learning-toolkit-emotional-expression

EmoPy was built from the scratch and inspired by the research of Dr. Hongying Meng. The core requirements of EmoPy include the following:

  • Neural Network Architectures include layers which feed outputs to each other in sequence. The performance of these architectures is highly depended upon the choice and sequence of layers that make up the Neural Network Architecture.
  • Selecting datasets is really important as the larger the image library, the higher the accuracy and generalizability of the models. Today, there are not many public data sets that are available. EmoPy was able to take advantage of Microsoft’s FER2013 and the Extended Cohn-Kanade datasets. The FER2013 dataset includes over 35,000 facial expressions for seven emotion categories that include anger, disgust, fear, happiness, sadness, surprise and calm. The Cohn-Kanade dataset includes facial expression sequences rather than still images which represent a transition between these facial expressions. The Cohn-Kanade dataset contains 327 sequences.

Image Source: https://www.thoughtworks.com/insights/blog/emopy-machine-learning-toolkit-emotional-expression

  • The training process is the next consideration the ThoughtWorks team addressed. The process includes the training of the neural networks and selected datasets. The dataset was split into two parts: a training set and a validation set. The process then included:
    • Images from the training set were used to train the neural network where an emotion prediction is assessed based upon weighting and parameters.
    • The neural network would then compare the predicted emotion against the true emotion and calculate a loss value.
    • The loss value would then be used to adjust the weight of the neural network. Iterating over this process allowed the prediction model to become more intelligent and accurate.
    • The validation set is then used to test the neural network after it has been trained. It was very important for the Thoughtworks team to have two different datasets. By using a different set of images from the training set, they were able to evaluate the model more objectively. Using this approach also prevented “overfitting” which is “when a neural network learns patterns from the training samples so well that it is unable to generalize when given new samples.” When overfitting occurs, the training set accuracy is much higher than the validation set.
  • Measuring performance was the final requirement for EmoPy. The question that the ThoughtWorks team sought to answer was how accurate are given architectures when predicting emotions based on the training set and the validation set? Within the results, the ConvolutionINN model performed the best. In the case of emotion sets such as disgust, happiness and surprise, the neural network was able to correctly predict images it had never seen before 9 out of 10 times. While the accuracy for disgust, happiness and surprise is high, it isn’t always the case for other emotions. Misclassifications are possible, especially in the case of fear. The best way to deal with these misclassifications is use the largest dataset possible.

The EmoPy project is actively looking for contributors. Whether you want to contribute to the project or just use it, the project team has chosen to use an unrestrictive license to make it available to the broadest audience possible. 

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.

Presentation: Spring Framework 5.1 on JDK 8 & 11

MMS Founder
MMS RSS

Is your profile up-to-date? Please take a moment to review and update.

You will be sent an email to validate the new email address. This pop-up will close itself in a few moments.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.

Presentation: Pivotal Cloud Foundry 2.3: A First Look

MMS Founder
MMS RSS

Is your profile up-to-date? Please take a moment to review and update.

You will be sent an email to validate the new email address. This pop-up will close itself in a few moments.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.

Agile Data Modeling for NoSQL Databases

MMS Founder
MMS RSS

Pascal Desmarets, CEO of Hackolade, recently spoke at Data Architecture Summit 2018 Conference about agile modeling for NoSQL databases. He said that data modeling is even more important in NoSQL databases when the constraints provided by normalization have been taken down. Unstructured and polymorphic big data creates challenges both in terms of data governance and regulations (GDPR and PII) and the ability to leverage the information accumulated.

Desmarets also talked about how data modeling helps enterprises migrate from RDBMS to NoSQL. Benefits of data modeling in relational and NoSQL databases include higher application quality, improved data quality, GDPR & privacy identifiable information and business intelligence.

Teams should choose the right NoSQL database based on their requirements. For example, choose key-value store if you need to manage simple schema and faster read/write with no frequent updates. Choose document data store if you have flexible schema with complex querying. Column-oriented databases are good for extreme write speeds with relatively less velocity reads. And graph databases are better for applications requiring traversal between data points where you need the ability to store properties of each data point as well as relationships between them.

He talked about traditional data modeling process and how we are transitioning from data modeling to schema design approach. Conceptual data model has been replaced by domain-driven design (DDD), logical data model is no longer needed, and physical data model is replaced by physical schema design.

In the Agile development process, data modeling has a role in every step of the process, including in production. Data modeling effort becomes a shared responsibility and a dialog between multiple project stakeholders.

He also said that domain-driven design (DDD) and NoSQL are made for each other and there is a direct match between DDD language and the concepts of NoSQL databases. He advocates that coherence is necessary throughout strategy, process, architecture, and technology, as it is preferable to apply all these principles together rather than just one or two in isolation: Domain-Driven Design, Agile development, Data-Centricity, Microservices, Event-Driven architecture, NoSQL, DevOps, and Cloud.

InfoQ spoke with Desmarets about the best practices of data modeling of NoSQL databases and big data management.
 

InfoQ: Is the data modeling approach different for each NoSQL database type, e.g. time-series database like Cassandra v. graph database like Neo4j?

Pascal Desmarets:   The global method is very similar, but the implementation can be vastly different. To leverage the benefits of NoSQL, it is critical to design data models that are application-specific, so you store information in a way that optimizes query performance. This mind shift can be a challenge for those who have been applying the principles of application-agnostic data modeling for decades. For many of us, the rules of normalization have become second nature, and we have to force ourselves to apply a query-driven approach to schema design for NoSQL databases.

The query-driven methodology is fairly similar for all families of NoSQL databases. But when it comes to the specific aspects of schema design for each technology, the biggest difference is between graph databases and the three other families of NoSQL DBs. The nature of graph databases is such that graph traversal performance — when executing queries — requires that an attribute in any other technology, may be promoted to the status of an entity (or node) in the case of a graph database.

Then you have differences within each family of NoSQL databases. For graph databases for examples, there’s a fundamental difference between property graph DBs and RDF triple stores. Within JSON document databases, you will find structural storage differences between Couchbase for example, and MongoDB. Similarly, HBase and Cassandra have very different approaches to data storage.

InfoQ: Can you talk about some best practices in agile modeling of NoSQL databases?

Desmartes:  Data modeling as we’ve known it for decades is under a lot of pressure in the context of Agile development. Despite attempts to make data modeling more agile, it is often viewed as a bottleneck to the speed and cadence of two-week sprints. And data modelers across the world feel left out of the process. The truth is that some form of schema design is indispensable, meaning that data modeling needs to be re-invented to remain relevant.

First, data modeling needs to be an iterative process through the development sprints and through the application lifecycle, instead of being a heavy front-loaded task.

Data modeling also needs to be a collaborative process between data modelers (who are outstanding at abstracting their understanding of the business) and developers (who really understand how to translate requirements into code).

This requires that data modelers be an integral part of the scrum teams.

The methodology needs to be adapted to the development techniques and the technology stack, in particular with a query-driven and application-specific approach as described earlier. It must combine Domain-Driven Design, user experience and flowcharting of business rules, combined with screen wireframes and reports, to derive the application queries to take into account when designing the schemas.

Finally, we need next-gen tools that are nimble and adapted to this new landscape!

He also said that for quite some time, NoSQL database vendors have created a differentiation and a buzz by using terms like ‘schema-less’ or ‘non-relational’. But NoSQL databases are so flexible and powerful that inexperienced users can easily get in trouble if they don’t apply some rigorous techniques. And the vendors now realize that, in order to sell their solutions to enterprises, it is wiser to use the term ‘dynamic schema’ instead. Data modeling (or schema design) is in fact more important when dealing with NoSQL than it was with relational databases. We just need a different kind of data modeling than in the past. And data modelers should embrace Agile development and learn the implications of new technology stacks to prove their added-value in the process.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.