Monday, June 25, 2012

The NULL Dilemma : The NULL is not equal to NULL : Part-1

In this post I will discuss few interesting facts and problems with NULL in sql server .

Null signifies ‘absence of any data value’ .In other words value is missing  or value is unknown. It is important to understand that NULL is not zero or empty value .

NULL propagation
An expression involving NULL will result in NULL .(with few exception which I will discuss in part 2).This is called NULL propagation.
As I said NULL means value is unknown, so if we try to do some operation on unknown data it should give me another unknown data, isn't it ?

Eg.     Try below in sql server

NULL + 100   ->  add 100 to an unknown value.  Clearly I ll get another unknown value i.e. NULL
Similarly NULL * 10  ->  NULL
NULL * 0  -> NULL (Interesting)
NULL / 0   -> NULL (Amazing...no divide by zero exception!!!)

On a lighter note, it seems for MS SQL server creators NULL is bigger than 0 .

Three valued logic:
We know about Boolean algebra. The typical TRUE/FALSE thing. Here we compare two values which are known. It results in either TRUE or FALSE.
Now if we do comparison with a NULL  , then !!!We can’t get a Boolean value . Rather it results in UNKNOWN.So we now have 3 values possible from a comparison TRUE/FALSE/UNKNOWN . Its not surely bollean logic . Its three valued logic.
Lets see some example .

 DECLARE @var as VARCHAR(10)=NULL  
 IF @var='satya'  
 SELECT 1  
 ELSE  
 SELECT 0  
Here @var is NULL . I am trying to compare NULL with a known value. So as I said it will result as UNOWN. Sql server passes anything other than TRUE to else clause . So output is 0 .
Similarly for case statement any comparison not TRUE will go to else part.
SELECT CASE WHEN @var='satya' THEN 1 ELSE 0 END
Same is for while loop too. It will not enter the loop if comparison involves a NULL .

As 2 valued logic has a truth table , our three valued logic too have a truth table.
AND
TRUE
UNKNOWN
FALSE
TRUE
TRUE
UNKNOWN
FALSE
UNKNOWN
UNKNOWN
UNKNOWN
FALSE*
FALSE
FALSE
FALSE*
FALSE
*Simple rule of AND operator, if one expression results false whole operation is false
OR
TRUE
UNKNOWN
FALSE
TRUE
TRUE
TRUE*
TRUE
UNKNOWN
TRUE*
UNKNOWN
UNKNOWN
FALSE
TRUE
UNKNOWN
FALSE
*Simple rule of OR operator, if one expression results TRUE whole operation is true
NOT
TRUE
FALSE
FALSE
TRUE
UNKNOWN
UNKNOWN

Let’s have an example to demonstrate .

 CREATE TABLE #temp  
 (  
 name varchar(20),  
 age int  
 )  
 INSERT INTO #temp VALUES('satya',26)  
 INSERT INTO #temp VALUES('biswa',29)  
 INSERT INTO #temp VALUES('panu',15)  
 INSERT INTO #temp VALUES('rahul',NULL)  
 INSERT INTO #temp VALUES('devi',35)  

# temp contains below data.

Satya
26
Biswa
29
Panu
15
Rahul
NULL
Devi
35

Now ,
        SELECT * FROM #temp WHERE age>10 AND name='rahul'
Here age>10 evaluates to UNKNOWN and name='rahul' evaluates to TRUE  at 4th row. 

So comparison is done as  UNKNOWN AND TRUE. So from truth  table it is UNKNOWN. So it will not be included in result set.
Now let’s change a bit
SELECT * FROM #temp WHERE age>10 OR name='rahul'
Here for 4th row comparison is done as  'UNKNOWN OR TRUE' .From truth table it is TRUE.So 4th row will be included in resultset.
Similar test can be done while checking a query against a table with NULL data..

The NOT IN problem.
Lets create another temporary table in addition to #temp.

 CREATE TABLE #temp1  
 (  
 newage int  
 )  
 INSERT INTO #temp1 VALUES(26)  
 INSERT INTO #temp1 VALUES(15)  
 INSERT INTO #temp1 VALUES(NULL)  

#temp1's data

26
15
NULL
 
Now, 
SELECT * FROM #temp WHERE age IN(SELECT newage from #temp1)
  
Will translate to
SELECT * FROM #temp WHERE age = 26 OR age =15 OR age = NULL

Here where age=26 , comparison is done as TRUE OR FALSE OR UNKNOWN.->TRUE

For age = 15 , its FALSE OR TRUE OR UNKNOWN->TRUE

For age = NULL , its UNKNOWN OR UNKNOWN OR UNKNOWN -> UNKNOWN , so excluded
So RESULT :
satya    
26
panu     
15

Now see following
SELECT * FROM #temp WHERE age NOT IN(SELECT newage from #temp1)
Translates to
SELECT * FROM #temp WHERE age <> 26 AND age <>15 AND age <> NULL


When age = 26 , it is doing  FALSE AND TRUE AND UNKNOWN ->   FALSE

When age = 15 , its TRUE AND FALSE AND UNKNOWN ->  FALSE

WHEN age=NULL ,  its UNKNOWN OR UNKNOWN OR UNKNOWN ->  UNKNOWN

But when age = 29 , its TRUE AND TRUE AND UNKNOWN ->  UNKNOWN , so excluded(!!!)


So we ll have empty result set (which is wrong). So we need to be careful when using NOT IN.
One solution is 

SELECT * FROM #temp WHERE age NOT IN(SELECT newage from #temp1 where newage IS NOT NULL)

(We will see IS NOT NULL in detail in part2)

O/P
biswa
29
devi
35

But this will miss NULL rows . So  preferable way is to use NOT EXISTS  

SELECT * FROM #temp WHERE  NOT EXISTS(SELECT 1 from #temp1 WHERE age=newage)

Here any 'age'  in table #temp for which the subquery doesnot return any row will be included in resultset .

SELECT 1 from #temp1 WHERE age=newage  Translates to

SELECT 1 from #temp1 WHERE age=26 OR age=15 OR age=NULL

For age=NULL , it is doing  UNKNOWN OR UNKNOWN OR UNKNOWN  ->  UNKNOWN

So the row with a null  will be included in final output  of
SELECT * FROM #temp WHERE  NOT EXISTS(SELECT 1 from #temp1 WHERE age=newage)

O/P
biswa
29
rahul
NULL
devi 
35
Clearly this approach preferable as it will include null row .If requirement needs rows  without null we can easily modify it to our needs.

So always prefer to use NOT EXISTS ( I never use NOT IN in my stored procedures due to this reason) .