Tuesday, July 17, 2012

The amazing TRAILING SPACE theory

Sql server ignores trailing spaces while comparing using WHERE and HAVING clauses .However this behavior is different for LIKE clause.

Lets some examples on it

 create table #temp ( a varchar(10) )  
 insert into #temp values(' a ')  
 select COUNT(*) from #temp group by a having a=' a ' ->1  
 select COUNT(*) from #temp where a=' a     ' ->1  
 select COUNT(*) from #temp where ' a        '=a ->1  
 select COUNT(*) from #temp where a='a' ->0  
 select COUNT(*) from #temp where a like ' a ' ->1  
 select COUNT(*) from #temp where a like ' a' ->1  
 select COUNT(*) from #temp where a like ' a '->0  
 select COUNT(*) from #temp where ' a ' like a ->1  
 select COUNT(*) from #temp where ' a' like a->0  
 drop table #temp  

For LIKE sql has following principle when the string is ASCII(i.e. for varchar and char etc…)

Msdn:

When you perform string comparisons by using LIKE, all characters in the pattern string are significant. This includes leading or trailing spaces. If a comparison in a query is to return all rows with a string LIKE 'abc ' (abc followed by a single space), a row in which the value of that column is abc (abc without a space) is not returned. However, trailing blanks, in the expression to which the pattern is matched, are ignored. If a comparison in a query is to return all rows with the string LIKE 'abc' (abc without a space), all rows that start with abc and have zero or more trailing blanks are returned.

 

However when string is Unicode(nvarchar…) the above rule is not applicable. i.e. there is no significance of trailing spaces and matching is done exactly as it is done for ASCII.

Lets try above example on a Unicode string

 create table #temp ( a nvarchar(10) )  
 insert into #temp values(' a ')  
 select COUNT(*) from #temp group by a having a=' a ' ->1  
 select COUNT(*) from #temp where a=' a     ' ->1  
 select COUNT(*) from #temp where ' a        '=a ->1  
 select COUNT(*) from #temp where a='a' ->0  
 select COUNT(*) from #temp where a like ' a ' ->1  
 select COUNT(*) from #temp where a like ' a' ->0  
 select COUNT(*) from #temp where ' a ' like a->0  
 select COUNT(*) from #temp where ' a' like a ->0  
 drop table #temp  

So LIKE is used for pattern matching and not for comparisons. And we need to be extra careful when using LIKE and might take care of trailing spaces using RTRIM to be very sure of correct data retrieval.

For where clause trailing spaces are ignored as can be seen in example and RTRIM can be avoided.

Now we need to remember that string storage is affected by ANSI_PADDING setting which is by default ON .But the comparison is not affected. Again as storage changes , pattern matching(LIKE) changes accordingly. This is very important to see.

 SET ANSI_PADDING OFF  
 create table #temp ( a varchar(10) )  
 insert into #temp values(' a ')  
 select DATALENGTH(a) from #temp ->2  
 select COUNT(*) from #temp where a=' a    ' ->1  
 select COUNT(*) from #temp where a=' a' ->1  
 select COUNT(*) from #temp where a like ' a ' ->0 (important ..)  
 drop table #temp  

 

So storage is done by trimming trailing spaces when ANSI_PADDING is OFF.

 

 

 

 

Friday, July 13, 2012

Sql server neglects trailing spaces

create table #temp ( a varchar(10) )

insert into #temp values('  a                ')

select COUNT(*) from #temp where a='  a'
select COUNT(*) from #temp where a='a'
first query will return 1
2nd will return 0 as sql server will only ignore trailing spaces

So we can avoid use of RTRIM in many situations .

The NULL Dilemma : The NULL is not equal to NULL : Part-3

In part 1 and part 2 we learned about NULL propagation , ISNULL,COALESCE, tristate logic etc.
Now let’s learn few remaining parts.
SET ANSI_NULLS
As we saw comparing NULL with another NULL returns  UNKNOWN . This is when ANSI_NULLS is ON .
This is ANSI standard to have this value as ON. But in older sql server versions  we can get rows returned when we query as WHERE col=NULL .This is in legacy codes . As Microsoft is deprecating this feature avoid using it.In future versions it will always be ON.
More in

NULLIF :
NULLIF function takes 2 arguments and returns null if they are equal , else it returns the first argument.
The 2 arguments should be of same datatype or convertible types.
Eg.
SELECT NULLIF(‘satya’,’saty’)  => ‘satya’
SELECT NULLIF(‘satya’,’satya’)  => NULL
Below is an apt scenario where it can be used

Say we are doing a division operation and want to select NULL if divisior is 0 then we can make use of NULLIF function.

SELECT   col1/NULLIF(col2,0) from table1


Few other facts:
  • A primary key column can’t have NULL values.
  • A Unique key column can have maximum 1 NULL values(Unless a filtered unique index is used…)
Eg.
 create table #temp  
 (  
 a int unique ,  
 b int  
 )  
 insert into #temp values(1,11)  
 insert into #temp values(2,22)  
 insert into #temp values(NULL,33)  
 insert into #temp values(NULL,44)  

The last insert is not allowed .

How ever below will allow duplicate NULL
 create table #temp  
 (  
 a int,  
 b int  
 )  
 CREATE UNIQUE INDEX ind ON #temp(a)  
 WHERE a IS NOT NULL  
 insert into #temp values(1,11)  
 insert into #temp values(2,22)  
 insert into #temp values(NULL,33)  
 insert into #temp values(NULL,44)  

  •  For GROUP BY and DISTINCT,  NULL behave differently .For example for DISTINCT  columns with NULL are not distinct(which should be as 2 NULLs cannot be compared to same)
 create table #temp  
 (  
 a int,  
 b int  
 )  
 insert into #temp values(1,11)  
 insert into #temp values(2,22)  
 insert into #temp values(NULL,33)  
 insert into #temp values(null,44)  
 insert into #temp values(1,55)  
 select distinct(a) from #temp  
 o/p  
 NULL  
 1  
 2  
 select a,SUM(b) from #temp group by a  
 o/p  
 NULL   77  
 1       66  
 2       22  
 drop table #temp  

Friday, July 6, 2012

The NULL Dilemma : The NULL is not equal to NULL : Part-2

In previous post(part 1) we learned about tristate logic for NULL ,Not Exists and Null propagation.In this post I will be explaining about ISNULL ,COALESCE, IS NULL, NULL relation with aggregate functions etc.

First of all lets create a temporary table to explain
CREATE TABLE #temp 
 ( 
 name varchar(20), 
 age int 
 ) 
 INSERT INTO #temp VALUES('satya',26) 
 INSERT INTO #temp VALUES('biswa',NULL) 
 INSERT INTO #temp VALUES('panu',15) 
 INSERT INTO #temp VALUES('rahul',NULL) 
 INSERT INTO #temp VALUES('devi',35) 


In last post I said any expression involving NULL becomes NULL. Like NULL +10,NULL/0 etc. But aggregate functions like SUM ,AVG just ignore NULL while computing.

eg.
SELECT AVG(age) FROM #temp - >25

SELECT SUM(age) FROM #temp - >76

IS NULL/IS NOT NULL Predicate

If requirement is select record from table #temp for which age is equal to NULL ..
First temptation is to use below query

SELECT * FROM #temp where  age = NULL

But as I explained in previous post about NULL propagation this wont return any records.The IS NULL operator is to be used in this kind of senario.

SELECT * FROM #temp where age IS NULL

This will fetch all record where age is NULL .
o/p
biswa      NULL
rahul        NULL

Now I want records for which age is not NULL.

SELECT * FROM #temp where age IS NOT NULL
OR
SELECT * FROM #temp where NOT age IS NULL

This will fetch all record where age is not NULL .
O/P
satya       26
panu        15
devi         35

ISNULL/COALESCE Function:

ISNULL function takes 2 arguments and returns the first non-null value.

Syntax:
ISNULL(arg1,arg2)  (takes 2 arguments)

Here if arg1 is not null it will return arg1 ,
 if arg1 is null, arg2 is not null it will return arg2
else if both are null it will return NULL.

The dataype of return value is same as datatype of arg1.

So if I want records with NULL values I can write as
SELECT * FROM #temp where ISNULL(age,0)=0

To get records without NULL column
SELECT * FROM #temp where ISNULL(age,0)!=0

COALESCE function work similar to ISNULL but it can take anynumber of arguments .
Syntax:
COALESCE(arg1,arg2,arg3,arg4)

Here return value will be the first non-null argument.

Eg.:  SELECT * from #temp where COALESCE(age,NULL,0)=0

As it can take any number of parameters it has more advantage compared to ISNULL.Also COALESCE is ANSI standard .
Another very important fact is ISNULL returns data with a datatype same as datatype of first parameter.This causes problem in some situations.

Eg. Add one more row in #temp
INSERT INTO #temp VALUES(NULL,35)
SELECT ISNULL(name,'satyajit mohanty is the author') FROM #temp where ISNULL(name,'')=''
Will return
‘satyajit mohanty is’

Remember ISNULL will return data with same datatype as argument 1. Here it is varchar(20) .So o/p is a varchar with length 20.This would lead to wrong results when you tru to us this data or try to insert it into some other table.

On the other hand COALESCE will return data with a datatype  which is same as datatype of argument in accordance with highest datatype presedence.

Just replace ISNULL with COALESCE and check the output which will be correct one. 
Also try this
COALESCE(name,'satyajit mohanty is the author',0)
Remember datatype would be datatype of argument in accordance with highest datatype precedence before running above expression in a query.

So try to avoid ISNULL esspecially when you are using the output from ISNULL in further manipulation/storage.

Strange:

When try to run below queries

 CREATE TABLE #temp1
 ( 
 class int NOT NULL, 
 age int  NULL,
 mark AS  ISNULL (class,age) Primary Key
 )
 CREATE TABLE #temp1
 ( 
 class int NOT NULL, 
 age int  NULL,
 mark AS COALESCE(class,age) Primary Key
 )

Here the columns with ISNULL/COALESCE are called as computed columns.It is strange that first query will run sucessfully where as query 2 wont . I ran this on sql server 2008 . Don't know the reason yet.If you find/know the reason please post a reply.

So in this case ISNULL is unavoidable.

In next article we will see on NULLIF as this article already getting sufficiently large :) .