Raul Salas, Author at Mobile Monitoring Solutions

Uncategorized

How companies use Big Data

MMS • Raul Salas

Interesting Executive survey from the Oct 2017 Harvard Business Journal on the status of Big Data projects within their organizations..

Harvard Business Journal; survey shows 48% of companies use Big Data for expense reduction, while 36% use Big Data to add revenue, 48.4% report measurable results from their Big Data investments, and 87% of executives say their Big Data projects were “successful”.

This is good news for Open Source Big Data products such as Hadoop, Mongodb , Spark, and Cassandra as Corporations realize the value in both structured and unstructured data as well as Data Scientists and Engineers. This is setting the stage for the coming Machine Learning and AI revolution as Big Data infrastructure is built out and the data sets become more meaningful!

http://hbr.org/2017/04/how-companies-say-theyre-using-big-data

Author: Raul Salas

Uncategorized

Mongodb Replication and Fault Tolerance

MMS • Raul Salas

Mongodb Replication and Fault Tolerance

There is a lot of confusion on when Mongodb Replication and Fault Tolerance solution is appropriate as well as what Replication is as well as what it is not. In this blogpost, we will take a closer look into Mongodb Replication.

What it is

Mongodb (Mongodb.com) replication provides Fault Tolerance. Fault tolerance is the property that enables a system to continue operating properly. In the event of the failure of (or one or more faults within) some of its components the ability of maintaining functionality when portions of a system break down is referred to as graceful degradation.

Mongodb Replication provide the ability to bring down a server for Maintenance. Tasks like Operating System patching, Mongodb patching, or replace hardware without taking a database outage.

Mongodb Replication provides the ability to perform other tasks on the Secondary hosts such as backups and redirected reads.

What it is not…

Mongodb replication is not a load balancer where writes can be balanced across multiple hosts.

Mongodb replication is not sharding aka Horizontal scaling. (That question has come up!)

So, let’s dive into the details of issues to consider when deploying a Replication High Availability solutions.

Your business requires some level of fault tolerance. For example, an online store will suffer a significant financial loss if the database becomes unavailable for any amount of time.
Typical reasons for outages are as follows: patching, hardware failure, network failure, and data center outage.
You cannot have 2 masters, Mongodb provides a master – slave replication, where all writes occur on one host and are replicated to the other read only secondary hosts.

Fault Tolerance

The next consideration is what level of fault tolerance your business requires. There are differing levels of fault tolerance as listed below:

Number of Members	Majority Required to Elect a New Primary	Fault Tolerance
3	2	1
4	3	1
5	3	2
6	4	2

The question is what is right fault tolerance configuration for your organization? The answer is a mix of budget and business considerations.

In order to achieve a Mongodb Replication and Fault Tolerance of 1, you will need a minimum of three hosts. So, if one host goes offline, there are still two other hosts to elect a primary.

In order to achieve a Mongodb Replication and Fault Tolerance of 2, you will need 5 hosts. This requires two additional hosts and duplicate storage for each host. This can drive up expenses quickly and You will need to financially justify the need. In addition, tolerance for data loss as well as downtime will need to be analyzed.

Financial applications, such as shopping carts for an online merchant would best be suited for fault tolerance of 2 as any data loss or downtime will result in a significant financial loss.

For most use cases, such as cache for a mobile app that allows customers to locate a doctor in the health plan or review their health coverage, a fault tolerance of 1 will be sufficient.

As you can see, Mongodb Replication and Fault Tolerance is a powerful method to ensure uptime and availability of your Mongodb hosted application.

Author: Raul Salas www.mobilemonitoringsolutions.com

Uncategorized

Which NoSQL database do i use?

MMS • Raul Salas

Which database do I use? I get this question often from senior management. In the Open Source NoSQL world this is a very complicated question. The first question that needs to be answered what are you trying to accomplish? We should start out outlining what Open source databases are available.

Hadoop – Hadoop is an open source platform that provides excellent data management provision. It is a framework that supports the processing of large data sets in a distributed computing environment. It is designed to expand from single servers to thousands of machines, each providing computation and storage
Cassandra – Apache Cassandra is a free and open-source distributed NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure as well as the ability to perform cross bi-directional data center replication.
Spark – Apache Spark is an Open Source engine for large-scale data processing. It is optimized for the execution of multiple parallel operations on the same data set as they occur in many iterative machine learning tasks.
Mongodb – MongoDB offers rapid and agile development opportunities thru its ability to handle unstructured data, high availability and the ability to scale across commodity hardware.

OK, so that’s really a lot to consume! At this point, your probably still thinking “which database platform do I use?” The best way to explain each platform is by Use/Case. Use/Case #1 – (Cassandra) Let’s start off with the Use Case that larger companies especially global ones that have multiple data centers and need the ability to ensure that there is a single pane of glass view of their data. While this sounds simple, large companies spend lots of money shipping data across multiple platforms and systems to ensure timely accurate reflection of their data. Cassandra can provide them with that functionality. Cassandra can be setup in multiple data centers and be updated in both data centers and the data will be reflected in both locations with eventual consistency. So, this is an important term “Eventual Consistency”, this does not mean that the data will be updated in real time. It does mean that at some point usually not far from real-time the data will be updated in both data centers. What this means is say for example a health care company can quickly and more importantly easily obtain the information it wants from a central data store. It can have the ability to track policy growth over multiple countries without too many processes massaging and shipping the data over to a consolidated data store. Use/Case #2 – (Hadoop) Your company wants a 360 degree view of their customer base. For example, a company wants to consolidate it’s CRM data, web site browsing data, call center activity, and social media sentiment analysis of their brand both by the general public and their customers. This Use Case involves lots of data from disparate places in both structured and unstructured form. In addition, lots of storage and compute power will also be required. Apache Hadoop fits this requirement perfectly! Hadoop has the ability to bring in many data formats and structures into it’s own file system called HDFS. The method in which HDFS operates makes storage very affordable. In addition, Hadoop’s ability to horizontally scale and use map-reduce on large data sets makes processing very large amounts of data possible as well. Hadoop can perform sentiment analysis on social media posts to determine what the general public thinks about your product and company. All of this can be combined and analyzed as a single pane of glass into your customers purchases, contacts, sales leads, website activity, as well as social media posts in order to obtain a 360 degree view of your company’s customers and business. Use/Case #3 – (Spark) Your a credit card processor and You would like to prevent fraud with credit card in real-time or near real-time. This is where Spark comes into play. Spark can store the data temporarily and execute machine learning algorithms on the data in real time. And make decisions on multiple inputs as to whether the transaction is fraudulent. Spark is a kind of like a temporary read-only store that is scalable. Use/Case #4 – (Mongodb) Your organization wants to quickly create a data warehouse from various sources of data and they would need to be related and consolidated into a single view for business analysis. Mongodb’s flexible schema allows dissimilar and unstructured data sources to be related and combined for easy analysis. In the case of large data sets, Horizontal scaling can be used to break up the data sets across multiple commodity based hardware. in order to meet response time SLA. Hopefully, this explains in a simplistic manner which NoSQL database you should use.

Uncategorized

Mongodb and Personalization

MMS • Raul Salas

A few months ago, I got a call for a Mongodb project that involved Personalization. First thing I asked was “what is Personalization? I accepted the project and found out a whole new area of technology that is quickly becoming interwoven into our daily lives. The best way to explain Personalization is by Use Case. For example, Amazon recommends items that we might be interested in purchasing based on our demographic, behavioral, browsing and purchasing habits, Facebook sends ads our way based on our past likes and posts. Netflix recommends movies based on our past viewing habits. All of this was difficult if not impossible with older rigid relational database technologies. With the rise of open source unstructured database technology, Personalization has come into the mainstream.

So your CIO may be asking “Why can’t I do this with my existing Relational databases?” Existing relational databases support transactions which means overhead in ensuring that each transaction is correct. You wouldn’t want your bank to double deduct a ATM transaction or check incorrectly. In addition, rigid table structures make it difficult if not impossible to manage diverse data sources and complex data models easily and fast. In addition, once data grows exponentially, it is impossible to scale horizontally to gain adequate performance making the data usable. Non structured Open Source databases such as Mongodb can handle complex non structured and various structured data from different data sources easily and can scale horizontally to meet Service level agreements for query response times.

Sitecore (www.Sitecore.com) web site content management software is the latest vendor requiring the use of Mongodb for it’s Personalization functionality and high availability and scalability for global high traffic commercial websites. Expect many other vendors to make similar announcements and for Personalization to grow as Artificial Intelligence and Machine Learning technology expands into the real world.

Mobile Monitoring Solutions

Uncategorized

How companies use Big Data

MMS • Raul Salas

Subscribe for MMS Newsletter

Did you know...

Uncategorized

Mongodb Replication and Fault Tolerance

MMS • Raul Salas

Subscribe for MMS Newsletter

Did you know...

Uncategorized

Which NoSQL database do i use?

MMS • Raul Salas

Subscribe for MMS Newsletter

Did you know...

Uncategorized

Mongodb and Personalization

MMS • Raul Salas

Subscribe for MMS Newsletter

Did you know...