About SQL Queries

Mar 31st, 2013 | By | Category: Database

Structured Query Language is a declarative language, which means we don’t need to know what happens behind the scenes, we just know what we are going to accomplish but not how we are going to do that. Meaning, we can not / do not know whether any arrays, loops etc are involved with a sql query when it is getting executed, we just know we are fetching data using the statement, and we are really happy when we get the expected resultset using a purely English-like-language. But to tune the queries we need to open the curtains and have a peek on what is going on behind the scene. To understand how a query works, lets have a look in the following picture.


Now we’ll discuss the process step by step.

Parsing – During the parsing phase, the statement is broken down into it’s component parts – determining what type of a statement it is – whether it is a DML or a DDL or a select query. Then the syntax and the semantics are getting checked. The syntax check validates the statement based on the SQL grammar. The semantics checks make sure whether the statement is a valid one in the light of the schema objects that you have i.e. do those tables, views etc. exist in the schema or not and whether you have the access to those objects and are proper privileges in place for you to execute that statement. All those "ambiguous column name" , "table or view does not exists" etc errors come from here.

The next stage in the parsing process is either a soft parse or a hard parse. If the statement that we are currently parsing has been processed by some other session, then we can skip the next two steps which are optimization and row source generation – if we are able to skip those two steps, it is called as a soft parse (this term is not frequently used and is more common in the Oracle world). If we cannot find a suitable plan in the memory, then it has to do all the steps i.e. parse, optimize and generate the plan for the query. This is called as a hard parse.

In the parsing stage some syntax based optimization also takes place. Like, converting right outer joins to equivalent left outer joins, flattening sub-queries to an equivalent join syntax etc.

Optimization – Well, Oracle Optimizer itself is a topic for a large note, so I’ll not go into the details here, will cover it in my next note. Here I’ll give a brief idea what the heck the optimizer is.

The optimizer determines the most efficient way to execute a SQL statement after considering many factors related to the objects referenced and the conditions specified in the query. This determination is an important step in the processing of any SQL statement and can greatly affect execution time.

In the picture above you can see, two types of Optimizers are mentioned. Rule based and Cost based, Oracle 10g onwards, RBO is deprecated and CBO is preferred over it. So lets have a look on their definitions.


Rule-Base Optimizer

used a heuristic method(based on some set of rules or a general formula) to select among several alternative access paths with the help of certain rules. All paths were ranked and the lowest was chosen i.e. using the ROWID was a cost of 1, a full table scan was a cost of 19. the ranking system which is referred in RBO can be found in this link – http://docs.oracle.com/cd/B10500_01/server.920/a96533/rbo.htm#38893


Cost-based Optimizer

The CBO uses statistics on tables and indexes, the order of tables and columns in the SQL statements, available indexes, and any user-supplied hints to pick the most efficient way to access the data requested.

CBO almost always performs better than RBO.

In the next note I’ll discuss more about the optimization, explain plan and different terminologies used for alternative paths generated by the explain plan. Here lets move to the next step, Row Source Generator.

Row Source Generator – The definition as mentioned in the Oracle doc states, The row source generator receives the optimal plan from the optimizer. It outputs the execution plan for the SQL statement. The execution plan is a collection of row sources structured in the form of a tree. A row source is an iterative control structure. It processes a set of rows, one row at a time, in an iterated manner. A row source produces a row set.

I think the definition is a little tough to crack for its jargon. I tried for simpler explanations in other sites, but none was found to my disappointment. So lets decipher it by simple common sense. By iterative control structure we are sure some loop/ GOTO statement is involved. So what I can understand from the above definition is, the execution plan is nothing but a set of code which tells the executor to iterate through a set of rows and process it as the query asked to do.

SQL Execution – Here first I’ll tell you what my common sense tells me about the sql executor, then I’ll go for the definition used in the Oracle doc. As I assumed, we now have a set of code to loop through the rows and do some processing as the final output of the row source generator. Now we are going to feed this code to the executor, who will run those code and output the desired result.

And the Doc says, The SQL execution engine is the component that operates on the execution plan associated with a SQL statement. It then produces the results of the query. Each row source produced by the row source generator is executed by the SQL execution engine. Which I think go hand in hand with my common sense..

And with this you get some records inserted/updated/deleted or fetched according to your need by writing just some pure old English statement. So we discussed briefly how a query gets executed.

As I mentioned in this note, I’ll surely continue with the optimizer in my next few notes (as I’m perfectly sure, it can not be explained thoroughly in one single note).

Kick It on DotNetKicks.com