Classle

Query Optimization - Heuristic Approach

QUERY OPTIMIZATION

A query expressed in a high-level query language such as SQL must first be scanned, parsed, and validated. The scanner identifies the language tokens—such as SQL keywords, attribute names, and relation names—in the text of the query, whereas the parser checks the query syntax to determine whether it is formulated according to the syntax rules (rules of grammar) of the query language. The query must also be validated, by checking that all attribute and relation names are valid and semantically meaningful names in the schema of the particular database being queried. An internal representation of the query is then created, usually as a tree data structure called a query tree. It is also possible to represent the query using a graph data structure called a query graph. The DBMS must then devise an execution strategy for retrieving the result of the query from the database files. A query typically has many possible execution strategies, and the process of choosing a suitable one for processing a query is known as query optimization.

The query optimizer module has the task of producing an execution plan, and the code generator generates the code to execute that plan. The runtime database processor has the task of running the query code, whether in compiled or interpreted mode, to produce the query result. If a runtime error results, an error message is generated by the runtime database processor.

 

Translating Queries in to Relational Algebra

An SQL query is first translated into an equivalent extended relational algebra expression—represented as a query tree data structure—that is then optimized. Typically, SQL queries are decomposed into query blocks, which form the basic units that can be translated into the algebraic operators and optimized. A query block contains a single SELECT-FROM-WHERE expression, as well as GROUP BY and HAVING clauses if these are part of the block. Hence, nested queries within a query are identified as separate query blocks. Because SQL includes aggregate operators—such as MAX, MIN, SUM, AND COUNT.

 

Eg:

SELECT LNAME, FNAME FROM EMPLOYEE

WHERE SALARY > (SELECT MAX (SALARY) FROM EMPLOYEE WHERE DNO=5);

The query contains nested query. Split into two query blocks.

C = (SELECT MAX (SALARY) FROM EMPLOYEE WHERE DNO=5);

SELECT LNAME, FNAME FROM EMPLOYEE WHERE SALARY > c;

where c represents the result returned from the inner block. The inner block could be translated into the extended relational algebra expression

MAX SALARY(sDNO=5(EMPLOYEE))

and the outer block into the expression

pLNAME, FNAME(sSALARY>C(EMPLOYEE))

The query optimizer would then choose an execution plan for each block. We should note that in the above example, the inner block needs to be evaluated only once to produce the maximum salary, which is then used—as the constant c—by the outer block. We called this an uncorrelated nested query. It is much harder to optimize the more complex correlated nested queries, where a tuple variable from the outer block appears in the WHERE-clause of the inner block.

 

Basic Algorithms for Query Executing

An RDBMS must include algorithms for implementing the different types of relational operations (as well as other types of operations) that can appear in a query execution strategy. The following are the various algorithms used:

1 External Sorting

2 Implementing the SELECT Operation

3 Implementing the JOIN Operation

4 Implementing PROJECT and Set Operations

5 Implementing Aggregate Operations

6 Implementing Outer Join

7 Combining Operations Using Pipelining

 

Using Heuristics in Query Optimization

One of the main heuristic rules is to apply SELECT and PROJECT operations before applying the JOIN or other binary operations. This is because the size of the file resulting from a binary operation—such as JOIN—is usually a multiplicative function of the sizes of the input files. The SELECT and PROJECT operations reduce the size of a file and hence should be applied before a join or other binary operation.

ü       Cost-based optimization is expensive, even with dynamic programming.

ü       Systems may use heuristics to reduce the number of choices that must be made in a cost-based fashion.

ü       Heuristic optimization transforms the query-tree by using a set of rules that typically (but not in all cases) improve execution performance:

·         Perform selection early (reduces the number of tuples)

·         Perform projection early (reduces the number of attributes)

·         Perform most restrictive selection and join operations (i.e. with smallest result size) before other similar operations.

·         Some systems use only heuristics, others combine heuristics with partial cost-based optimization.

 

Eg:

 

Pcustomer_name((sbranch_city = “Brooklyn  (branch)     account)     depositor)

 

1)       When we compute

                        (sbranch_city = “Brooklyn” (branch)    account )


we obtain a relation whose schema is:
(branch_name, branch_city, assets, account_number, balance)

2)       Push projections using equivalence rules; eliminate unneeded attributes from intermediate results to get:
 
Pcustomer_name ((
   
Paccount_number ( (sbranch_city = “Brooklyn” (branch)     account ))   
                                             depositor )

3)       Performing the projection as early as possible reduces the size of the relation to be joined.

 

 

 

 

 

 

 

References:

 

Fundamentals of Database Systems – Elmasri Navathe

codex.cs.yale.edu/avi/db-book/slide-dir/ch14.ppt

 


Dear Guest,
Spend a minute to Register in a few simple steps, for complete access to the Social Learning Platform with Community Learning Features and Learning Resources.
If you are part of the Learning Community already, Login now!
4
Your rating: None Average: 4 (1 vote)

Collaborate

Posted by



Wed, 05/20/2009 - 16:14

Share