Columnstore index in sql server (tutorial)
Articles,  Blog

Columnstore index in sql server (tutorial)


In this video we will try to understand what exactly is the use of Columnstore index and in which scenario we should be using it. How does this columnstore index is different from a clustered and non-clustered indexes? Incase you are new to clustered and non-clustered index please watch the clustered and non-clustered index video in the Learn SQL Server section. First will try to understand how this index different from clustered and non-clustered indexes. In Clustered and non-clustered indexes we have an hierarchy of the Root node and the root node has an Intermediate node and Intermediate has a Leaf node The Leaf node points towards a data and the data is stored into 8 KB pages. We stored in a row format. On the screen we have a very simple table called as tblcustomer We have inserted lot of dummy records into this. On this table also we have an index. This table is having just 3 fields There is an Id , CustomerName and CustomerCode. On Id there is a primary key and that primary key has a Clustered index. We have the Leaf node and the leaf node is pointing to these records and these records are stored in a row format in 8 KB pages. The first 5 records will go into the first 8 KB page the second 5 records go into the second 8 KB page. We can have multiple pages out there and in each one of these pages we have these records fitted according to the size of the row. Let’s assume that we are fetching lot of records We are fetching some columns from the first 8 KB page and fetching some columns from the second 8 KB page so on… If we are actually querying lot of 8 KB pages or scanning lot of 8 KB pages that will increase the IO. If we can rearrange this row type thing into a column type thing then the IO can decrease. We can fit one column into one 8 KB page. From the first 8 KB page and from the second 8 KB page we can take one column customer name and fit it into one page. Then we can have a second page in that we can fit the second column. Basically, Columnstore index is where rather than fitting a full row into one page we try to just fit one column into one page. By doing so the input output will decrease. If we fire a query saying “we just want CustomerName so we do not have to scan lot of 8 KB pages. We can just scan one or two 8 KB pages and get the whole columns from that. Columnstore indexes increases the select performance by fetching from less number of 8 KB pages because the structure of storage is more column wise then the row wise. Because we are storing only one type of data or one column into one page we have lot of scope for compression. Let us say we have records like India, India, US, US, India , India we can say all these India’s means one. We can store somewhere in some other place India and we can have reference value 1 stored here with that the size of the record can decrease and we can fit large number of records into one page. Again there is lot of scope of compression over here as well. That’s what Columnstore indexes does. First, it arranges the data in column wise per page and second it also applies compression. By applying compression we can store large number of rows in the same page. By storing large number of records in the same page we don’t have to again query large number of pages because we can take the data from one or two page because it is already compressed and it is fitting lot of data into one page. But said and done creation of columstore indexes is definitely very heavy process because we have to arrange the data into the pages using column structure we have to compress and so on. If the database is highly transactional then columnstore indexes can be bad. Because for the record inserted it has to compress it, it has to align them into the pages. and that’s a tedious work. When it comes to selecting records or retrieving or reporting kind of scenario then definitely columstore indexes score on that. If we have an OLAP system. OLAP or analytical system where the customer is more fetching the records doing analysis then we should use columstore indexes. If we are having OLTP system where there is high transactions happening every minute then columnstore indexes could go really bad on that. That’s why columstore indexes are meant specially for projects like SSAS, SSIS and so on. where people built OLAP system. Let us practically see how much is the benefit of columnstore indexes when it comes to retrieval Select CustomerName and CustomerCode from the tblCustomer and see what kind of plan are we getting if we try to get all the records from the table. If we have lot of records on this table. It is saying the plan we are using at this moment is clustered index scan. The estimated IO cost is 3.66. We have moved the mouse on the clustered index scan operator. It is telling the IO cost is 3.66. In order to see this we have to hit over here. This display estimated execution plan. At this moment the IO cost is the most important thing because columnstore indexes as per the funda should have less IO cost because it is going to fetch from less number of 8 KB pages. Will create a columnstore index To create a columnstore index here is a syntax “CREATE NONCLUSTERED COLUMNSTORE INDEX with the name and on which columns we want to create the columnstore index. Once we fire this query it will take the records of CustomerCode and try to fit it into the 8 KB pages. For CustomerCode there will be separate 8 KB pages where only CustomerCode will be stored. For CustomerName there will be separate 8 KB pages where CustomerName will be stored. Go to the design mode here and check if the columnstore index has been created or not. The MyColumnStore has been created and the type is Columnstore. Even we can create from the UI also. Copy this and see the plan over here. This plan is using a clustered index scan, we can remove order by id and just select Customer scan. We just fired it, we were trying to keep two window where we can compare It was 3.66 something If we move the mouse here we can see the Estimated IO cost has gone to 0.0031. If we look at the numbers there is a vast difference. When we use columnstore indexes the IO cost comes down and the select query becomes more better in terms of performance. The syntax is quite simple. The way we are writing syntax in Clustered and Non-Clustered we have to write exactly like that. The only thing we need to put here is the Columnstore keyword. For example, if we want to create a non-clustered index this is the syntax “CREATE NONCLUSTERED INDEX then the columns. The only thing we need to put is Columstore. The columstore indexes will get created. Also we can create columstore indexes by using the design mode. Incase we are not able to remember syntaxes then we can use design mode. Let us revise what we have learnt. First thing Columnstore Indexes puts one column into one page. Because of that the IO decreases because now we will be fetching from less number of pages as compare to the row kind of architecture. Second thing is columnstore indexes also does compression , by doing so it tries to fit as many as records possible into less amount of pages. The final thing is columstore indexes is very good for OLAP type of system. If it is OLTP type of system then we will have performance issues because columnstore indexes does lot of background work like Compressing, Aligning and those kind of things. In this video we were trying to understand what is columnstore index and how it differs from normal clustered and non-clustered index. Thank you very much.

12 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *