Month: August 2020
MMS • Raul Salas
The main driving factor in migrating to the cloud is cost of ownership and ability to scale a database environment rapidly and with no impact to an application. Cloud databases will address many of these driving factors. Many businesses today are planning to move their databases to the cloud.
In recent months, this push has been accelerating. Of the current cloud based data warehouses on the market, Snowflake has been the first choice for many business aiming to make this transition, and for good reason.
One of the most prominent advantages of Snowflake is that tuning and indexes are not needed, especially when dealing with relational queries against structured data. In addition, scaling can be done without disruption, whether it’s up or down. Database management is done automatically and there is unlimited query concurrency. The built in high availability and 90 day data retention protects against node failures, human errors, unforeseen events, and malicious attacks.
The IDC predicts that by 2023, 50% of revenue for data and analytics will come from public clouds, representing a 29.7% compound annual growth rate. This means pubic cloud deployments are projected to grow eight times faster than onpremises deployments which sits at 3.7%. Gartner projects and even more aggressive switch to the cloud than IDC, predicting 75% of all databases will be deployed or migrated to a cloud platform by 2022. Further, Gartner predicts only 5% of these workloads will return to on-premises deployments.
The ability of Snowflake to scale so easily and effectively is partially a function of its architecture. Snowflake does not store data on separate nodes or have the same limitations as on premises SAN storage. Additionally, for storage, Snowflake fully encrypts, and compresses data to bring down costs. It also extracts metadata to enable a more efficient query process with no need for indexing. By retrieving the minimum amount of data needed from storage to run a query, Snowflake can process these requests quicker. Caching data and query results locally results in faster performance with less compute. A consistent view of the data with no blockers can be provided by parallel processing with support from multiple compute engines. In addition, There are multiple AWS availability zones that are composed of stateless compute resources that run across them. This layer provides the unified data presentation layer to include replication and data exchange. In addition, all security functions, management, and query compile operations and optimization as well as coordinating all transactions. In short, cloud databases will be all the rage in 2020 and into 2021, simplicity of management and the ability to scale easily will be significant factors leading management to adopt Snowflake.
MMS • Bruno Couriol
Twitter recently released the new Twitter API (early access) to be used by third-party developers. The new Twitter API features three new product tracks: standard, academic research, and business. The new API offers conversation threading, poll results in Tweets, pinned Tweets on profiles, spam filtering, real-time tweet tracking, and a more powerful stream filtering and search query language.
By Bruno Couriol
This is the first in a series of blogposts on Snowflake, a cloud based data warehouse.
MMS • Raul Salas
Snowflake is quickly being adopted by many companies in their migration to the cloud. Launched in 2014, Snowflake is quickly becoming the premiere data warehousing solution for businesses of all sizes. As a cloud based service for structured and semi-structured data, Snowflake has, in their own words, “combined the power of data warehousing, the flexibility of big data platforms and the elasticity of the cloud at a fraction of the cost of traditional solutions.” Snowflake’s strength is the ability to separate storage from compute.
Their fee structure allows business to only pay for computing services while in use, without the immediate up front cost of setting up a data platform of their own. For example, a data warehouse can automatically shut down when not in use. Most cloud based data warehouse systems will continue to bill even if they are not being used.
By building everything on a cloud based infrastructure and reducing the need for costly hardware, users of snowflake are able to capitalize on a fast paced and powerful analytics platform to streamline their own operations. Once set up and configured, an installation could be administered without a full time database administrator and SAN storage administrator.
Furthermore, Snowflake can be a strong replacement for SQL Server, Hadoop, and Vertica in a data warehouse use-case setting. Snowflake compliments Mongodb by easily loading Json document data natively without requiring transformation to a fixed relational schema, saving time and headaches along the way. Snowflake also handles json data far easier than providers like AWS Redshift due to the storage of native json. Mongodb data can be copied into Snowflake in small batches every few minutes to lower the performance impact on the databases; this increases response and uptime of the Snowflake data warehouse. With unlimited storage, the need to work with an internal team or an on premises group becomes redundant. In short, Snowflake is the premiere cloud based database in terms of performance factor and total cost of ownership. It is definitely a player to watch in the emerging database scene.
If this sounds too good to be true, sometimes a good thing can be true. in future blogposts, more in-depth technical deep dives.