Mobile Monitoring Solutions

Search
Close this search box.

Don’t miss Snowflake’s Cloud Data Summit 2020!

MMS Founder
MMS Raul Salas

Starting Tuesday, November 17 at 8:30AM Pacific

Join data luminaries, Snowflake business and technology experts, customers, and partners to hear how the Data Cloud can connect you to a world of data. You’ll also hear about the latest technology advancements to Snowflake’s cloud data platform

Choose among dozens of  sessions within a number of summit tracks to help you unlock the value of the Data Cloud. You’ll get the knowledge to become a data leader and help guide your team towards mobilizing your data.

Connect with thousands of your peers virtually and learn how organizations just like yours are using Snowflake in innovative ways to get all the insight from all their data by all their users.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Snowflake: What is a Virtual Warehouse and how to use it

MMS Founder
MMS Raul Salas

Everyone is excited about Snowflake cloud database! IPO stock price aside, Snowflake really is a revolutionary concept. Today we will look more into a major architecture component of Snowflake called the Virtual(data) warehouse.

At their core, virtual warehouses are simply one or more clusters of compute resources used to process queries and other dml operations.  You can create many warehouses and use the Web User Interface to run queries on data and other dml operations.

You can use various virtual warehouses for different purposes such as loading, analysis, and to support software development lifecycles such as development, test, integration, test, and production.

Description: Create a Virtual data warehouse in Snowflake

You can then configure the database with a name, and size (always pick the smallest and scale up). The max # of clusters, in this case we will select 2;  with a min # of clusters set to 1. Scaling policy is set to standard, this will kick in when activity picks up to a certain level and resources are consumed. The auto suspend option will deactivate the ware house after 15 min of inactivity.  This will reduce costs and you actually only pay for what you use! The Auto Resume option will startup the warehouse when activity picks back up! 

In addition, the “Show SQL” feature (bottom left) can create the command to generate a virtual warehouse at the command line

CREATE WAREHOUSE Test_DW WITH WAREHOUSE_SIZE = ‘XSMALL’ WAREHOUSE_TYPE = ‘STANDARD’ AUTO_SUSPEND = 600 AUTO_RESUME = TRUE MIN_CLUSTER_COUNT = 1 MAX_CLUSTER_COUNT = 2 SCALING_POLICY = ‘STANDARD’ COMMENT = ‘create virtual warehouse example’;

Now that your virtual warehouse is created, the fun begins! You can load some sample data and create databases and then you can select which virtual warehouse you can run your queries on!  In the screenshot below you can select which data warehouse to run your queries on.  For the traditional on-prem database administrator, this is a dream! You can have the ability to run your queries and data on the virtual data warehouse compute power of your choice!  This is a significant switch since usually the dba is restricted to a specific virtual cluster/hardware.

You can now have multiple virtual warehouses to select from to execute your query! Great for software development lifecycle development and testing, no more loading your data into different environments before you can use it. A simple configuration change and you’re in different virtual warehouses.

Description: Graphical user interface, application

Description automatically generated

As you can see below, we ran the

select current_database(), current_schema();

command to see the current database and schema.

Description: Graphical user interface, text, application

Description automatically generated

So to summarize, you can create a virtual warehouse. Then select your database and commands you wish to run and execute your queries with the virtual warehouse of your choice! Snowflake makes it easy to learn this helpful interface as well as climb the learning curve that comes with any new platform. In addition, with the help of certified Snowflake partners, there is no limit to the applications for businesses large and small.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Cloning Healthcare Data: A Snowflake Guide

MMS Founder
MMS Raul Salas

Snowflake’s Cloning and data masking is the answer to security restrictions related to health care related PHI data.

Snowflake has managed to solve issues that many database administrators have faced in corporate environments where they manage personal information. But this issue is magnified in the health care industry where protecting personal health data is the highest priority and makes even simple operational processes very difficult if not impossible to even consider.

For example, getting production data migrated and cleansed involves batch job/backup processes that involved moving data across servers over the network and then running programs to mask and scramble personal health info. So developers are left with stale data to work with even in their test environments.

Making changes to data that is incorrect in production is a labor and resource intensive process.

Snowflake’s cloning and data masking functionality will simplify many health care production environment processes as well as make them more secure. In this blogpost we will explore Snowflake’s impact on a typical Health care operational database and software development processes. NOTE: we will explore more in-depth Snowflake’s cloning and data masking technical implementation (Reference Architecture) in a future blogpost.

In short, here is what Snowflake cloning functionality can do.

1. Clone Very large databases/tables in seconds as many times as you want
2. Only pay for data you store and update (pay once) in other words you only pay for production original data store. If production used 2TB, you would only be charged for one copy of the data.
3. Update data in test automatically
4. Promote any test data to int or prod rapidly
5. Your cloned data will be updated in real time as production changes occur
6. You can go back to specific time or query and clone the database state at that time and create archived databases easily.

So cloning health data takes only a few seconds and your developers can start their work. In addition, changes to the clone that is made can easily be promoted in real-time to production. Think about all the hours and man power currently spent on making all the manual processes to support making copies of production data to lower environments.

Now, let’s combine this with Snowflake’s data masking and this is where things start to get really interesting.

1. Mask PHI field level data in production.
2. Clone Production to Test environment.
3. Apply security roles that will allow control over who sees Personal health information
4. Make changes to Test environment new changes to diagnostic codes for example.
5. Promote cloned test environment to Integration and ensure there are no issues with changes and data
6. Promote cloned integration environment to Production.

Snowflake’s new data masking combined with cloning will solves many issues in managing data environments as well as give DBAs the ability to sleep through the night and have a weekend to spend with their family!

Snowflake has really solved many issues that plague database operational and security issues in health care for the past 20 – 30 years!

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Snowflake: Migrating to the Cloud

MMS Founder
MMS Raul Salas

The main driving factor in migrating to the cloud is cost of ownership and ability to scale a database environment rapidly and with no impact to an application. Cloud databases will address many of these driving factors. Many businesses today are planning to move their databases to the cloud.

In recent months, this push has been accelerating. Of the current cloud based data warehouses on the market, Snowflake has been the first choice for many business aiming to make this transition, and for good reason.

One of the most prominent advantages of Snowflake is that tuning and indexes are not needed, especially when dealing with relational queries against structured data. In addition, scaling can be done without disruption, whether it’s up or down. Database management is done automatically and there is unlimited query concurrency. The built in high availability and 90 day data retention protects against node failures, human errors, unforeseen events, and malicious attacks.

The IDC predicts that by 2023, 50% of revenue for data and analytics will come from public clouds, representing a 29.7% compound annual growth rate. This means pubic cloud deployments are projected to grow eight times faster than onpremises deployments which sits at 3.7%. Gartner projects and even more aggressive switch to the cloud than IDC, predicting 75% of all databases will be deployed or migrated to a cloud platform by 2022. Further, Gartner predicts only 5% of these workloads will return to on-premises deployments.

The ability of Snowflake to scale so easily and effectively is partially a function of its architecture. Snowflake does not store data on separate nodes or have the same limitations as on premises SAN storage. Additionally, for storage, Snowflake fully encrypts, and compresses data to bring down costs. It also extracts metadata to enable a more efficient query process with no need for indexing. By retrieving the minimum amount of data needed from storage to run a query, Snowflake can process these requests quicker. Caching data and query results locally results in faster performance with less compute. A consistent view of the data with no blockers can be provided by parallel processing with support from multiple compute engines. In addition, There are multiple AWS availability zones that are composed of stateless compute resources that run across them. This layer provides the unified data presentation layer to include replication and data exchange. In addition, all security functions, management, and query compile operations and optimization as well as coordinating all transactions. In short, cloud databases will be all the rage in 2020 and into 2021, simplicity of management and the ability to scale easily will be significant factors leading management to adopt Snowflake.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


This is the first in a series of blogposts on Snowflake, a cloud based data warehouse.

MMS Founder
MMS Raul Salas

Snowflake is quickly being adopted by many companies in their migration to the cloud.  Launched in 2014, Snowflake is quickly becoming the premiere data warehousing solution for businesses of all sizes. As a cloud based service for structured and semi-structured data, Snowflake has, in their own words, “combined the power of data warehousing, the flexibility of big data platforms and the elasticity of the cloud at a fraction of the cost of traditional solutions.”  Snowflake’s strength is the ability to separate storage from compute. 

Their fee structure allows business to only pay for computing services while in use, without the immediate up front cost of setting up a data platform of their own. For example, a data warehouse can automatically shut down when not in use.  Most cloud based data warehouse systems will continue to bill even if they are not being used.   

By building everything on a cloud based infrastructure and reducing the need for costly hardware, users of snowflake are able to capitalize on a fast paced and powerful analytics platform to  streamline their own operations.  Once set up and configured, an  installation could be administered without a full time database administrator and SAN storage administrator.

Furthermore, Snowflake can be a strong replacement for SQL Server, Hadoop, and Vertica in a data warehouse use-case setting. Snowflake compliments Mongodb by easily loading Json document data natively without requiring transformation to a fixed relational schema, saving time and headaches along the way. Snowflake also handles json data far easier than providers like AWS Redshift due to the storage of native json.  Mongodb data can be copied into Snowflake in small batches every few minutes to lower the performance impact on the databases; this increases response and uptime of the Snowflake data warehouse. With unlimited storage, the need to work with an internal team or an on premises group becomes redundant. In short, Snowflake is the premiere cloud based database in terms of performance factor and total cost of ownership. It is definitely a player to watch in the emerging database scene.

If this sounds too good to be true, sometimes a good thing can be true.  in future blogposts, more in-depth technical deep dives.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


How to afford a switch to a STEM career

MMS Founder
MMS Raul Salas

https://www.bankrate.com/loans/how-to-afford-the-switch-to-a-stem-career/

https://www.bankrate.com/loans/how-to-afford-the-switch-to-a-stem-career/

If you’re unsatisfied with your current career, changing to a career in STEM — which stands for science, technology, engineering and math — might be a solid option. 

Employment in STEM occupations has grown 79 percent since 1990, from 9.7 million to 17.3 million and has outpaced overall U.S. job growth. The thirst for STEM workers hasn’t subsided, either. The demand for STEM professionals creates a huge need for new entrants into the STEM workforce.

Transitioning to a STEM career can come with financial barriers, but it can be worth the initial investment in the long run. Personal loans, grants and other sources of funding can mitigate career-change expenses.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Neo4j database drops and restores

MMS Founder
MMS Raul Salas

Ok, so the latest database platform for Artificial Intelligence is Neo4j (www.neo4j.com), a knowledge graph database that provides in-depth analytics and insight into your data.  It is a relatively newcomer to the database world and with that comes less than mature functionality that is taken for granted in the relational database industry. Operations such as dropping and restoring databases need to be performed manually across all cluster nodes and the cluster will require a full outage.  This means loss of High Availability for a period of time.  This is a less than ideal situation, even for development environments, especially in the offshore around the clock development cycles that are the norm in many development environments.  

Be prepared that developers will be treating your database environment as disposable as they iterate thru their development cycle troubleshooting the new technology.  This usually results in deleting databases as well as restoring databases multiple times a day. The following steps outline deleting and restoring databases that would get an administrator up to speed. 

Developers may want to start out with a full new load from an external source such as Kafka and will want you to delete database data so they can restart a batch load from an external source like Kafka or Hadoop.

Delete NEO4j Data 

1. make sure you bring down all instances on the cluster before doing any other steps!

cd /neo4j-enterprise-3.5.4/bin

./neo4j stop

 2. after all neo instances are down, issue the following commands on each node

 cd /neo4j-enterprise-3.5.4/bin

./neo4j stop

./neo4j-admin unbind

rm -rf /neo4j/data/databases/*

Once data directory on all nodes are deleted then startup the cluster nodes

 ./neo4j status

./neo4j start

tail -f /neo4j/logs/neo4j.log (logs should eventually show the instance coming up waiting for additional cluster members)

You should at this point have a fully blank slate cluster ready for testing!

Developers may want to freeze data with a backup and revert back to it at will. So here are the steps to restore a neo4j database.

RESTORE NEO4J database operations

On each host issue the following commands to refresh 

1. make sure you bring down all instances on the cluster before doing any other steps!

cd /neo4j/bin/neo4j-enterprise-3.5.5/bin

./neo4j stop

2. after all neo instances are down, issue the following commands on each node and wait for neo to start up without error before moving to the next node (of course you will need to pre-stage your backup to the restore location below)

cd /neo4j/bin/neo4j-enterprise-3.5.5/bin

./neo4j stop

./neo4j-admin unbind

./neo4j-admin restore –from=/neo4j/data/test_restore/graph.db-backup –database=graph.db3.5 –force

./neo4j status

./neo4j start

tail -f /neo4j/logs/neo4j.log (logs should eventually show the instance coming up waiting for additional cluster members)

3. if for some reason the node does not come up delete the data directory  /neo4j/data/databases

and re-execute tasks in step #2 above.

Raul Salas

Raul@mobilemonitoringsolutions.com

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


The real competition between the U.S. and China will be in artificial intelligence and data

MMS Founder
MMS Raul Salas

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


The White House should worry less about China’s progress and invest heavily in artificial intelligence breakthroughs, according to Kai-Fu Lee.

MMS Founder
MMS Raul Salas

The government is not doing enough to prepare the American population for AI’s impact on the economy and jobs.

http://www.technologyreview.com

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Top 10 Algorithms used by Data Scientists

MMS Founder
MMS Raul Salas

Interesting that Regression is at the top!

 

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.