Finished reading the Snowflake database documentation

Posted on May 16, 2017 by Bobby

I just finished reading the Snowflake database documentation and I thought I would blog about my impressions. I have not spent a lot of time using Snowflake so I can not draw from experience with the product. But, I read all the online documentation so I think I can summarize some of the major themes that I noticed. Also, I have read a lot of Oracle database documentation in the past so I can compare Oracle’s documentation to Snowflake’s. Prior to reading the online documentation I also read their journal article which has a lot of great technical details about the internals. So, I intend to include things from both the article and the documentation.

My observations fall into these major categories:

Use of Amazon Web Services instead of dedicated hardware for data warehouse
Ease of use for people who are not database administrators
Use of S3 for storage with partitioning and columnar storage
Lots of documentation focused on loading and unloading data
Lots of things about storing and using JSON in the database
Limited implementation of standard SQL – no CREATE INDEX
Computer science terms – from the development team instead of marketing?
Role of database administrator – understand architecture and tuning implications
Focus on reporting usage of resources for billing
JavaScript or SQL stored functions
Down side to parallelism – high consumption of resources
High end developers but still a new product
Maybe specialized purpose – not general purpose database

First let me say that it is very cool how the Snowflake engineers designed a data warehouse database from scratch using Amazon Web Services (AWS). I have worked on a couple of different generations of Exadata as well as HP’s Neoview data warehouse appliance and a large Oracle RAC based data warehouse so I have experience with big, expensive, on site data warehouse hardware. But Snowflake is all in Amazon’s cloud so you don’t have to shell out a lot of money up front to use it. It makes me think that I should try to come up with some clever use of AWS to get rich and famous. All of the hardware is there waiting for me to exploit it and you can start small and then add hardware as needed. So, to me Snowflake is a very neat example of some smart people making use of the cloud. Here is a pretty good page about Snowflake’s architecture and AWS: url

You would not think that I would be happy about a database product that does not need database administrators since I have been an Oracle database administrator for over 20 years. But, it is interesting how Snowflake takes some tasks that DBAs would do working with an on site Oracle database and makes them easy enough for a less technical person to do them. There is no software to install because Snowflake is web-based. Creating a database is a matter of pointing and clicking in their easy to use web interface. Non-technical users can spin up a group of virtual machines with enormous CPU and memory capacity in minutes. You do not setup backup and recovery. Snowflake comes with a couple of built-in recovery methods that are automatically available. Also, I think that some of the redundancy built into AWS helps with recovery. So, you don’t have Oracle DBA tasks like installing database software, creating databases, choosing hardware, setting up memory settings, doing RMAN and datapump backups. So, my impression is that they did a good job making Snowflake easier to manage. Here is a document about their built-in backup and recovery: url

Now I get to the first negative about Snowflake. It stores the data in AWS’s S3 storage in small partitions and a columnar data format. I first saw this in the journal article and the documentation reinforced the impression: url1,url2. I’ve used S3 just enough to upload a small file to it and load the data into Snowflake. I think that S3 is AWS’s form of shared filesystem. But, I keep thinking that S3 is too slow for database storage. I’m used to solid state disk storage with 1 millisecond reads and 200 microsecond reads across a SAN network from a storage device with a large cache of high-speed memory. Maybe S3 is faster than I think but I would think that locally attached SSD or SSD over a SAN with a big cache would be faster. Snowflake seems to get around this problem by having SSD and memory caches in their compute nodes. They call clusters of compute nodes warehouses, which I think is confusing terminology. But from the limited query testing I did and from the reading I think that Snowflake gets around S3’s slowness with caching. Caching is great for a read only system. But what about a system with a lot of small transactions? I’ve seen Snowflake do very well with some queries against some large data sets. But, I wonder what the down side is to their use of S3. Also, Snowflake stores the data in columnar format which might not work well for lots of small insert, update, and delete transactions.

I thought it was weird that out of the relatively small amount of documentation Snowflake devoted a lot of it to loading and unloading data. I have read a lot of Oracle documentation. I read the 12c concepts manual and several other manuals while studying for my OCP 12c certification. So, I know that compared to Oracle’s documentation Snowflake’s is small. But, I kept seeing one thing after another about how to load data. Here are some pages: url1,url2. I assume that their data load/unload statements are not part of the SQL standard so maybe they de-emphasized documenting normal SQL constructs and focused on their custom syntax. Also, they can’t make any money until their customers get their data loaded so maybe loading data is a business priority for Snowflake. I’ve uploaded a small amount of data so I’m a little familiar with how it works. But, generally, the data movement docs are pretty hard to follow. It is kind of weird. The web interface is so nice and easy to use but the upload and download syntax seems ugly. Maybe that is why they have some much documentation about it?

Snowflake also seems to have a disproportionate amount of documentation about using JSON in the database. Is this a SQL database or not? I’m sure that there are Oracle manuals about using JSON and of course there are other databases that combine SQL and JSON but out of the relatively small Snowflake documentation set there was a fair amount of JSON. At least, that is my impression reading through the docs. Maybe they have customers with a lot of JSON data from various web sources and they want to load it straight into the database instead of extracting information and putting it into normal SQL tables. Here is an example JSON doc page: url

Snowflake seems to have based their product on a SQL standard but they did not seem to fully implement it. For one thing there is no CREATE INDEX statement. Uggh. The lack of indexes reminds me strongly of Exadata. When we first got on Exadata they recommended dropping your indexes and using Smart Scans instead. But, it isn’t hard to build a query on Exadata that runs much faster with a simple index. If you are looking up a single row with a unique key a standard btree index with a sequential, i.e. non-parallel, query is pretty fast. The lack of CREATE INDEX combined with the use of S3 and columnar organization of the data makes me think that Snowflake would not be great for record at a time queries and updates. Of course, an Oracle database excels at record at a time processing so I can’t help thinking that Snowflake won’t replace Oracle with its current architecture. Here is a page listing all the things that you can create, not including index: url

Snowflake sprinkled their article and documentation with some computer science terms. I don’t recall seeing these types of things in Oracle’s documentation. For example, they have a fair amount of documentation about HyperLogLog. What in the world? HyperLogLog is some fancy algorithm for estimating the number of rows in a large table without reading every row. I guess Oracle has various algorithms under the covers to estimate cardinality. But they don’t spell out the computer science term for it. At least that’s my impression. And the point of this blog post is to give my impressions and not to present some rigorous proof through extensive testing. As a reader of Oracle manuals I just got a different feel from Snowflake’s documentation. Maybe a little more technical in its presentation than Oracle’s. It seems that Snowflake has some very high-end software engineers with a lot of specialized computer science knowledge. Maybe some of that leaks out into the documentation. Another example, their random function makes reference to the name of the underlying algorithm: url. Contrast this with Oracle’s doc: url. Oracle just tells you how to use it. Snowflake tells you the algorithm name. Maybe Snowflake wants to impress us with their computer science knowledge?

Reading the Snowflake docs made me think about the role of a database administrator with Snowflake. Is there a role? Of course, since I have been an Oracle DBA for over 20 years I have a vested interest in keeping my job. But, it’s not like Oracle is going away. There are a bazillion Oracle systems out there and if all the new students coming into the work force decide to shy away from Oracle that leaves more for me to support the rest of my career. But, I’m not in love with Oracle or even SQL databases or database technology. Working with Oracle and especially in performance tuning has given me a way to use my computer science background and Oracle has challenged me to learn new things and solve difficult problems. I could move away from Oracle into other areas where I could use computer science and work on interesting and challenging problems. I can see using my computer science, performance tuning, and technical problem solving skills with Snowflake. Companies need people like myself who understand Oracle internals – or at least who are pursuing an understanding of it. Oracle is proprietary and complicated. Someone outside of Oracle probably can not know everything about it. It seems that people who understand Snowflake’s design may have a role to play. I don’t want to get off on a tangent but I think that people tend to overestimate what Oracle can do automatically. With large amounts of data and challenging requirements you need some human intervention by people who really understand what’s going on. I would think that the same would be true with Snowflake. You need people who understand why some queries are slow and how to improve their performance. There are not as many knobs to turn in Snowflake. Hardly any really. But there is clustering: url. You also get to choose which columns fit into which tables and the order in which you load the data, like you can on any SQL database. Snowflake exposes execution plans and has execution statistics: url1,url2,url3. So, it seems that Snowflake has taken away a lot of the traditional DBA tasks but my impression is that there is still a role for someone who can dig into the internals and figure out how to make things go faster and help resolve problems.

Money is the thing. There are a lot of money related features in the Snowflake documentation. You need to know how much money you are spending and how to control your costs. I guess that it is inevitable with a web-based service that you need to have features related to billing. Couple examples: url1,url2

Snowflake has SQL and JavaScript based user defined functions. These seem more basic than Oracle’s PL/SQL. Here is a link: url

There are some interesting things about limiting the number of parallel queries that can run on a single Snowflake warehouse (compute cluster). I’ve done a fair amount of work on Oracle data warehouses with people running a bunch of parallel queries against large data sets. Parallelism is great because you can speed up a query by breaking its execution into pieces that the database can run at the same time. But, then each user that is running a parallel query can consume more resources than they could running serially. Snowflake has the same issues. They have built-in limits to how many queries can run against a warehouse to keep it from getting overloaded. These remind me of some of the Oracle init parameters related to parallel query execution. Some URLs: url1,url2,url3 In my opinion parallelism is not a silver bullet. It works great in proofs of concepts with a couple of users on your system. But then load up your system with users from all over your company and see how well it runs then. Of course, one nice thing about Snowflake is that you can easily increase your CPU and memory capacity as your needs grow. But it isn’t free. At some point it becomes worth it to make more efficient queries so that you don’t consume so many resources. At least, that’s my opinion based on what I’ve seen on Oracle data warehouses.

I’m not sure if I got this information from the article or the documentation or somewhere else. But I think of Snowflake as new. It seems to have some high-end engineers behind it who have worked for several years putting together a system that makes innovative use of AWS. The limited manual set, the technical terms in the docs, the journal article all make me think of a bunch of high-tech people working at a startup. A recent Twitter post said that Snowflake now has 500 customers. Not a lot in Oracle terms. So, Snowflake is new. Like any new product it has room to grow. My manager asked me to look into technical training for Snowflake. They don’t have any. So, that’s why I read the manuals. Plus, I’m just a manual reader by nature.

My impression from all of this reading is that Snowflake has a niche. Oracle tries to make their product all things to all people. It has every feature but the kitchen sink. They have made it bloated with one expensive add-on option after another. Snowflake is leaner and newer. I have no idea how much Snowflake costs, but assuming that it is reasonable I can see it having value if companies use it where it makes sense. But I think it would be a mistake to blindly start using Snowflake for every database need. You probably don’t want to build a high-end transactional system on top of it. Not without indexes! But it does seem pretty easy to get a data warehouse setup on Snowflake without all the time-consuming setup of an on premise data warehouse appliance like Exadata. I think you just need to prepare yourself for missing features and for some things not to work as well as they do on a more mature database like Oracle. Also, with a cloud model you are at the mercy of the vendor. In my experience employees have more skin in the game than outside vendors. So, you sacrifice some control and some commitment for ease of use. It is a form of outsourcing. But, outsourcing is fine if it meets your business needs. You just need to understand the pros and cons of using it.

To wrap up this very long blog post, I hope that I have been clear that I’m just putting out my impressions without a lot of testing to prove that I’m right. This post is trying to document my own thoughts about Snowflake based on the documentation and my experience with Oracle. There is a sense in which no one can say that I am wrong about the things that I have written as long as I present my honest thoughts. I’m sure that a number of things that I have written are wrong in the sense that testing and experience with the product would show that my first impressions from the manuals were wrong. For example, maybe I could build a transactional system and find that Snowflake works better than I thought. But, for now I’ve put out my feeling that it won’t work well and that’s just what I think. So, the post has a lot of opinion without a ton of proof. The links show things that I have observed so they form a type of evidence. But, with Oracle the documentation and reality don’t always match so it probably is the same with Snowflake. Still, I hope this dump of my brain’s thoughts about the Snowflake docs is helpful to someone. I’m happy to discuss this with others and would love any feedback about what I have written in this post.

Bobby

About Bobby

I live in Chandler, Arizona with my wife and three daughters. I work for US Foods, the second largest food distribution company in the United States. I have worked in the Information Technology field since 1989. I have a passion for Oracle database performance tuning because I enjoy challenging technical problems that require an understanding of computer science. I enjoy communicating with people about my work.

View all posts by Bobby →

This entry was posted in Uncategorized. Bookmark the permalink.

23 Responses to Finished reading the Snowflake database documentation

Mike Conklin says:

August 10, 2017 at 7:27 am

Thanks for the info. It’s a good place to start for someone like me. I’m currently an Oracle dba with the focus on the application. We have system type dba’s that take care of backups, version control, migrations and such. I do a lot of tuning and writing custom queries for customers with specific needs. My manager is technical and is seriously considering snowflake mainly for our analytical needs. Right now we are keeping our Oracle environment in tact and are looking to offload data to snowflake for analytics. We may convert everything to snowflake eventually but that is way down the road. So any info on exporting things to snowflake from Oracle would be of help as well.

I’ve been using Oracle since version 6.

Reply
- Bobby says:
  
  August 10, 2017 at 7:58 am
  
  Thanks for your comment Mike. I think that ultimately we pull data out of Oracle into a flat file, upload that file to AWS S3 storage, and then pull it in to Snowflake. I think that we use Informatica to massage the data that comes from Oracle before dumping it into a flat file. We had an initial load of several hundred gigabytes of data using a so called Snowball appliance. I think it is just a large USB drive. But, we extracted the historical data into flat files using sqlplus and gzipped them and copied them on to the appliance and Fedexed it to someone. I think it was Amazon. But, the key thing is that we are getting the data from Oracle into flat files at some point and that gets loaded into Snowflake.
  
  I started with Oracle 7.0.16 and an Oracle 6 DBA coworker first got me into Oracle years ago. I remember discovering the analyze command. 🙂 I’ve seen Snowflake run some of our Exadata queries a lot faster but we have complex VPD policies on Exadata that aren’t on Snowflake so I’m not sure that it is a fair comparison. Without indexes I can’t see Snowflake meeting general purpose database needs. 🙂
  
  Reply
John Ryan says:

March 20, 2018 at 3:45 am

Hey Bobby,

Thanks for writing up the views on Snowflake. Interested to see another independent viewpoint.

You’re right, Snowflake is not designed to be a general purpose database. I think it’s based upon the principle that “One size no longer fits all” in that the general purpose database solutions from Oracle, Microsoft and IBM are now being eclipsed by dedicated challenger technologies. These
include “NewSQL” databases from VoltDB, MemSQL and CockroachDB which address the OLTP space, and analytic warehouses including Vertica, AWS Redshift and Snowflake which implement a “columnnar” based storage mechanism and are entirely dedicated to Data Warehouse systems.

I’ve spent some time getting to grips with the huge changes in the database industry, and write a paper on LinkedIn about “NewSQL” technology. I’m now researching the Data Warehouse space, and came across your blog on Snowflake.

Your right about Snowflake having some “High End Engineers” behind it. As well as being headed by a ex-Microsoft Exec, it has a number of ex-Oracle technical evangelists.

Must admit, I think it’s pretty impressive from what I’ve seen so far. It largely follows the direction laid out by Dr. Michael Stonebraker (MIT winner of the Turing Medal for computing). I’ve included a couple of links to his papers at the end.

He argues that a database should be a dedicated system (OLTP, DW and streaming), and forgo all the “tuning knobs” you find on the likes of Oracle that have developed over the years (there’s nearly 500 parameters to tune).

It’ll be interesting to see how this plays out – as Oracle have since announced (yet to launch), a data Warehouse as a Service in the Cloud which is self-tuning and self-operating).

Looks to be an interesting time to work in the database industry.

Cheers,

John

References:

Oracle Vs. NoSQL Vs. NewSQL – Database Technology 3 of 3

A comparison of Oracle, NoSQL technology and the NewSQL options available. Advantages and drawbacks

https://www.linkedin.com/pulse/oracle-vs-nosql-newsql-database-technology-3-john-ryan/

The End of an Architectural Era
http://nms.csail.mit.edu/~stavros/pubs/hstore.pdf

One Size Fits all: An Idea Whose Time Has Come

https://www.semanticscholar.org/paper/%E2%80%9COne-Size-Fits-All%E2%80%9D%3A-An-Idea-Whose-Time-Has-Come-a-Stonebraker-%C3%87etintemel/0f6303e406690eeec5e2b9e14b9d6017e975cc8d

Reply
- Bobby says:
  
  March 20, 2018 at 7:58 am
  
  John,
  
  Thank you for your detailed comment. I am biased towards Oracle because I have worked with Oracle databases for most of my career and my income depends on my employer using Oracle. But, hopefully I can set my bias aside enough to say what I really think about database technology. The simplest way to say it is that I think that automation is much more limited than people say in their sales pitches. I spent years of my career trying to understand why Oracle would not run straightforward PeopleSoft queries in the way that I could tell was best. I finally began to understand how Oracle’s optimizer works after reading articles and books. Now I think that not only Oracle’s optimizer, but any SQL optimizer has vast limits on which queries they can run optimally. Optimizers can not in a short period of time anticipate all of the relationships that exist in the data of large and complex databases.
  
  So, when I hear of autonomous data warehouses and getting rid of all of the knobs that you can turn I am highly skeptical. After decades of development SQL optimizers still do not work. I believe that they can not. For the same reasons I think that databases can not be automated in other respects. Database companies will fool people with their claims of automation and make a lot of money but at the end of the day they will have to add the knobs back or else companies will move back to solutions that have them.
  
  Automation is great when it works, but you have to have ways to manually override it. That’s what I really think.
  
  Bobby
  
  Reply
  - Alex Katykhin says:
    
    August 7, 2018 at 11:18 pm
    
    Hi Bobby,
    
    I totally support you in this reply in particular and in most of the thoughts you expressed in this post. It was extremely helpful for me desperate to find any comparison or honest analysis of what snowflake really has under cover and what options can be really beneficial when moving away from Oracle to Snowflake. Still I’m trying to find more but of course personal experience can be more revealing.
    And I’m also DBA by the way.
    Thank you once more and hope to hear more from about the subject-)
    
    Alex
    
    Reply
    - Bobby says:
      
      August 8, 2018 at 8:54 am
      
      Thanks! I wish I could tell you that I had learned a lot more about Snowflake since the post but really Snowflake is so easy to use that our development teams have not needed any DBA help. It is so easy to create a new database and all that a lot of the things that a DBA would do are no longer needed. I keep thinking that I could help by digging into difficult performance issues but so far no one has asked me to do so. Whatever its limitations Snowflake is good at giving non-DBAs ability to spin up large DBs.
      
      Reply
Anonymous says:

August 21, 2018 at 1:17 pm

I really like your article, but have a few observations.

I too have spent many years tuning Oracle, SQL server, UDB DB2, mySQL …
Am currently looking at Cockroach DB and Snowflake DB.

One size does not fit all. Right tool for the Job?

One of the biggest issues with Snowflake DB is that people confuse it for an OLTP DB.
o Similar to Oracle, SQL Server UDB, DB2, MySQL.
o It is NOT
o It will never compete in this area. For that look to cockroach DB?

It is a Not Only SQL OLAP DB
o Meant to compete with Redshift, Big Query, …
o Column oriented, which is bad for OLTP,
but significantly improves the performance of OLAP.
o Native support for tabular, and semi-structured data (XML jason, avreo …)
Data Lake people eat this up
It stores AND automatically flattens semi-structured data to derived columns for rapid search.
o NOT a SQL DB, but rather a DB that supports SQL syntax.
Why create a new query language?
o No Indexes?
This greatly simplifies design and implementation complexity.
They believe micro partition statistics go a long way to reducing the need.
Time will tell.
o Statistics on micro partitions allow quick pruning of actual data you scan for a query.
This, with column oriented storage quickly reduces the data you need to retrieve from S3.
o DML operations require a table lock!
How will that ever scale?
Not an issue if you use it as a OLAP DB and are just doing a lot of data loads.
o Clustering the table is about the only real DBA type tuning.
While this might be a job for a DBA, Someone could just write an adviser script.
o App teams can scale up or out Warehouse to manage SLA. (Without a DBA 🙁 )
o Some have said the sweet spot is 500GB and beyond.
But I’d say any size OLAP DB that requires semi-structured document support

While there is little/no documentation on Physical file structure,
o I suspect they are using a git like content addressable system
o Each ‘commit’ gives a new view of state. If true, brilliant!
o This enables consistent snapshots without ‘Undo’
o This supports point in time fallback or queries.

HyperLogLog.
o They mention it so people will know they can do cardinality estimates from statistics
o People looking for a modern OLTP system demand this.
o Best case they can derive counts from micro-partition statistics
o Worst case it allows parallel queries to be more efficient
o FYI Oracle 12cR2 documentation talks a lot about this to reduce the size of partition statistics

Reply
- Bobby says:
  
  August 21, 2018 at 1:37 pm
  
  Thanks for your feedback! I’m not sure that I agree with everything you said but I appreciate the input. Really, it will be interesting to see how things go over time. I can’t help thinking that with any kind of database it helps to have someone who understands how it works. Maybe with cloud dbs we will be stuck with the vendor providing the database experts but I doubt that will scale. I don’t think it will work with Snowflake or any of its competitors including Oracle’s automated cloud offerings. But we shall see. The DBA is dead! Long live the DBA! 🙂
  
  Bobby
  
  Reply
  - Anonymous says:
    
    August 27, 2018 at 8:39 am
    
    Just curious, do you agree that this is a OLAP and not OLTP DB?
    
    As you mention, there will always be work for good DBA’s
    But many of the latest generation are manually running tuning advisers or doing what they have always done. Much of this should be automated.
    
    As you say it remains to be seen how well no index work. But micro partitions with effective pruning and the fact that S3 allows them to retrieve a proton of an object (say a single column) will greatly reduce the amount retrieved from S3. That combined with local caching to SSD may be the secret sauce. Given Amazon charges for S3 data movement and they don’t pass it on to us, implies this is an area they hope to minimize and profit on.
    
    Reply
    - Bobby says:
      
      August 27, 2018 at 8:56 am
      
      I agree that Snowflake is geared to OLAP and not OLTP as best as I can tell. The hard thing about this and any database performance issue is that you really don’t know until you test things. I was at an SAP HANA presentation where SAP was talking about using HANA for OLTP even though it uses in memory columnar storage for the data. I think Workday is similar. In my mind these methods of storing the data doesn’t make sense for OLTP but my intuition is often wrong so I would have to really do some nitty gritty testing to know. And the challenge that I feel in all of my database work is that it just takes time. Even if I have the ability to dig into something and figure it out do I have the time and energy to do it. I have not done that for Snowflake and OLTP so all I can go on is my best guess based on what I have read about its architecture.
      
      Thanks again for commenting on my blog post. I appreciate everyone who participates in the blog.
      
      Bobby
      
      Reply
Phil John says:

September 27, 2018 at 4:54 am

As others have echo’d here, you’re approaching Snowflake from the wrong direction. It’s not there to power a realtime transactional system, it’s there for when you have many PB of data that you need to run statistical analysis on, and for that, slower reads from S3 are not an issue, because it doesn’t matter that it takes 10 minutes to run a query, what matters is the result.

Trying to do what Snowflake is designed to do in the PB scale on a traditional DMBS just doesn’t scale cost effectively – from a software licensing perspective, or hardware perspective. If you’re ingesting multiple terabytes per day of data, just adding new disks to store it would be a full time job for several people, plus you’d need a lot of empty space in a data centre rack that you can expand into – and if you can’t expand any further, well, looks like you need to move it all somewhere else.

With a solution like Snowflake you can run these intensive data analyses and just keep scaling elastically – cloud providers such as Backblaze already hit 200PB back in 2015, and it’s probably that S3 has several orders of magnitude more data stored, and a whole team dedicated to expanding it in front of demand.

Reply
- Bobby says:
  
  September 27, 2018 at 8:24 am
  
  Thanks for your comment. It is an interesting perspective. I’m not sure how much Snowflake costs when you ingest PBs of data and use the corresponding amount of CPU and memory. All of that hardware has to cost something. Also with PB of data on S3 the first time a query is run it will be slow as can be until the data is cached.
  
  Reply
Steve Ring says:

September 28, 2018 at 11:34 am

I evaluated Snowflake for my company comparing it to Oracle, Oracle Exadata, Teradata and Redshift and found it to be one of the most innovative products I have seen. Along with some of the positives that you have mentioned, the most important feature that it has is distinct separation of storage and compute with both able to be elastically scaled up and down. You can have a very small compute cluster(a warehouse, bad name as you say) querying any part of a 200 TB dataset or a very large compute cluster querying any part or all of a 1 TB dataset. We have an Oracle SuperCluster on prem which has 128 cores and 150 TB of Exadata storage. If my workload requires 129 cores for satisfactory performance or I reach 151 TB of data, I am out of luck unless I want to buy more hardware. If I have a very small workload and only use 5 TB of storage I have still paid for the whole M7 Supercluster. Similarly with Redshift, even being in the cloud, if I provision a dense compute cluster each node can fit 2.5 TB of my data. This means that if I have 100 TB of data, I always need and have to pay for a 40 node cluster whether I need the compute or not.

With Snowflake, we do not have to know what our workload is because we can scale our storage to whatever we want and change from a medium warehouse to 5 medium warehouses or a medium warehouse to an extra large warehouse virtually on the fly. We even do this during our weekday process and from weekday to weekend. We run a 3XL warehouse to load our data and when done we turn the warehouse off and pay for the hours we used it. Our users utilize a large warehouse during business hours and we have an extra small warehouse running for them from 7 PM to 7 AM(and on weekends) in case they get up at 2 AM and want to run a query. We truly pay for what we use and that now becomes one of the main Snowflake DBA tasks, to make sure we are not running warehouses that are more than what we need. This is a task much easier than finding out what statistics are missing, what index is missing, what SQL query is malformed etc. etc. etc.

The other Snowflake feature that helps us very much is the way it handles unexpected concurrency. Snowflake clusters can detect queueing and spin up new clusters to handle additional concurrent users. If you have between 5-10 concurrent queries running and for some reason get 50 concurrent users from 10-10:30, Snowflake automatically gives you more compute during that time period and you pay for that compute for that time period. Exadata has IORM, Teradata has TASM but both just prioritize the workload that you cannot handle, they do not give you more resources. Redshift has nothing for this and your most important users just wait.

I am not a Snowflake salesman, just a 20 year Oracle DBA and lover of Oracle as it sounds like you are. I get presented new DBMS products all the time and am usually rolling my eyes while they are being presented. As was said, OLTP is not its specialty and it will not help you if all you run are SELECT * queries or queries with no predicates but Snowflake has some very unique and valuable features .

Yes the expert database tuner is a dinosaur. I have been doing what you have been doing for 25 years and there aren’t many of us left who can tell you why the cost based optimizer does what it does and that is what first led us to Snowflake. A real performance tuning expert is hard to find especially when more and more support models focus on having low cost resources. Given that we are headed in that direction, a database with no indexes and “no knobs to turn” will become a necessity in 10 years when people like you and me are retired.

Reply
- Bobby says:
  
  September 28, 2018 at 11:46 am
  
  Wow! Thank you for taking the time to post a detailed comment. I wonder though if SQL tuning/database tuning will really go out of style. My guess is that over time Snowflake will expose more of its internals to customers and give them more knobs to turn for tuning. I think there are limits to how much you can automate SQL and database tuning no matter how good the database code is. But, I can’t prove it and time will tell. Thanks again!
  
  Reply
- Alex says:
  
  September 28, 2018 at 1:21 pm
  
  Hi Steve,
  
  That’s really interesting to hear about Snowflake. Can you please also share some information about what kind of queries Snowflake is processing? I can see what queries are being run in our Data Warehouse and those are quite complex and that complexity is part of the problem of course. But this complexity allows users to get what they need even if it can not be done fast enough. Has Snowflake similar capabilities and does it allow users to build complex reports and what kind of features it provides in its query language?
  
  Reply
Anonymous says:

October 4, 2018 at 9:45 pm

Snowflake is built to perform as a warehouse. Implies all I/o. Large scale. Large tables. Wide tables. Parallel processing. Pruning. Pinnacle so far for data warehouse design. Includes Lots of loading, indexing not required – why bother with an index when a query reads all data. Thus the columnar (vertical partitioning). Warehouses are simple. Adding complexity makes them not warehouses. Small Dml commands r OLTP and have no use in a warehouse. Now done with oracle mans and snowflake. Read about netezza. Dedicated warehouse appliance. That’s it’s purpose. Sucks at non warehouse. Focus on netezza hardware architecture and why it’s built that way. – for I/o througput speed. After that read about Hadoop abd HDFS architecture. 😀

Reply
- Bobby says:
  
  October 6, 2018 at 6:15 am
  
  Thanks for your comment. I don’t agree without about indexes not being needed in data warehouses. I’ve worked on an Exadata data warehouse for a number of years and we still use indexes to get the performance we need in some cases. I think there are limits to what you can do with parallel processing. At some point you need to limit the amount of data that you query so you don’t consume so many resources.
  
  Bobby
  
  Reply
sivaraj says:

November 9, 2018 at 11:00 pm

Hi Bobby,

hope all good at your end.I am just looking for small info.Is there any other place we can find detail tutorial for snowflake other than snowflake site.can you send me if you know any sites..
Thanks

Reply
- Bobby says:
  
  November 10, 2018 at 3:22 am
  
  Thank you for leaving a comment.
  
  Last time that I checked there was not much in the way of training and documentation about Snowflake outside of their web site. That is one of the disadvantages of working with a new database like Snowflake instead of an established one like Oracle. But, over time their customer base could grow and some things like that could improve.
  
  Bobby
  
  Reply
Little Bobby Tables says:

December 3, 2018 at 10:46 am

“I think it would be a mistake to blindly start using Snowflake for every database need. You probably don’t want to build a high-end transactional system on top of it”

As many have pointed out, you have really missed the mark about what Snowflake is and does. I would recommend taking your Oracle glasses off and rereading the documentation.

Reply
- Bobby says:
  
  December 3, 2018 at 12:26 pm
  
  Thanks for your comment. I appreciate anyone that takes the time to read my blog and even more if they comment on it.
  
  There is no doubt that I have an Oracle bias since I have been working with Oracle for more than 20 years. Also, all I did was read the manuals and a paper and poke around a little bit. I’m not a Snowflake expert, but I think I understand the architecture pretty well assuming that the documentation is accurate. You have to test things to really know how they work. That is certainly true of Oracle and its documentation. I haven’t done much testing of Snowflake.
  
  I am not sure if you think that Snowflake does support transactional applications well, or if you think it was a mistake for me to imply that people might try to use Snowflake for a transactional application. The reality is that I haven’t tested Snowflake with a transactional workload so I don’t know for sure how it would do. It just didn’t seem suited for that sort of application based on the reading I did and my experience with other databases. But, you have to test it to know for sure so I could be wrong.
  
  Bobby
  
  Reply
- Alex says:
  
  December 3, 2018 at 2:06 pm
  
  @Little Bobby Tables, so maybe you can shed the light of your personal enlightenment in the area of Snowflake as it seems like you have proven cases when and what Snowflake does and how it rocks?
  
  Reply
Pingback: Cloud Native (Bookmarks) | Elitist

Finished reading the Snowflake database documentation

About Bobby

23 Responses to Finished reading the Snowflake database documentation

Leave a Reply Cancel reply

Subscribe to Blog via Email

Archives

Bobby Durrett\’s DBA Blog

Orafaq blog aggregator

Finished reading the Snowflake database documentation

Share this:

About Bobby

23 Responses to Finished reading the Snowflake database documentation

Leave a Reply Cancel reply

Subscribe to Blog via Email

Archives

Bobby Durrett\’s DBA Blog

Orafaq blog aggregator