Tuesday, April 30, 2013

Database roles - 2. Big...Mistakes?

Looks like my Friday post about BigData got some attention. In that case - let's continue! While looking at tons of articles about Big-anything I've noticed a very constant pattern - everybody is talking about multi-node clusters of database servers: you constantly hear about horizontal scalability, sharding, node failure tolerance etc... But let's ask the question - is it only me who feels that we are over-complicating our development patterns? Maybe we just don't know how to properly use available hardware resources?

Before going any further - let me tell you a "war story": couple of years ago a system built by us at Dulcian was compared to a similar system (both by the scope and functionality). And at one of the tender kick-off meetings our competitors loudly accused us of lying in the proposal! The reasoning was very simple - we claimed that our current required hardware was one database server (16 cores total) utilized by 15% and two application servers (4 core each) utilized by 5%. For competitors such a footprint was plainly impossible - to do exactly the same job they've been burning to the ground a couple of racks of servers!
Yeah, sure - by request our servers were counted and matched what was in the proposal!

That's the price to pay for architectural mistakes: for "them" it was a "new norm" to require a separate application server for every 100 users - and for years nobody questioned that ratio! So, let me ask - did our IT industry fall in the same trap and became accustomed to bad performance? Did we forget to wonder WHY do we need that extra app server for every 100 users?

As far as I see while attending different conferences current software/database architects prefer to have the same problems as their neighbor (pretty good "out-of-jail" card to show to management, isn't it?) rather than doing proper tuning on their systems. It is very rare for a company to do a real honest-to-God performance review - and event if it happens, it is usually in the wrong time and against the will of the current IT staff

Recently I talked to a number of top specialists who make living out of fixing somebody's systems - and all of them repeat the same story:
  • Once upon a time there was a current IT system that performed badly
  • "Contemporary architect" proposed complete rewrite of the whole system using "the new way"
  • Management looked at the cost of rewrite and went ballistic!
  • Somebody on IT staff finally proposed to hire a performance tuning experts/guru-DBAs. At least to be able to say next time that they followed all possible ways!
  • "Hired gun" in a couple weeks found a lot of "interesting things" (wasting about 3/4 of time fighting uncooperative locals). Some of those issue were fixable - some of them were not. But still the system started working faster (and often faster than anybody expected).
  • "Hired gun" got a big check (and curses behind the back from IT staff)
  • "Contemporary architect" got a pink slip (and also curses behind the back from IT management)
As you see, nobody is happy at the end! But the whole situation started when local IT specialists were not able to correctly evaluate existing resources and solutions. Yes, we all would like to be trend-compliant, but jumping to the system overhaul without a good reason is a dangerous as falling behind the technology curve.

Summary: your IT may not be as big as you think!


Monday, April 29, 2013

Dynamic SQL and RESULT_CACHE

Over the weekend one of my friends asked me for a clarification on RESULT_CACHE - a really interesting functionality that allows Oracle to cache in SGA results of PL/SQL function calls.

The point of confusion was the fact that between 11gR1 and 11gR2 Oracle development team significantly re-thought this feature - if in R1 developers needed to explicitly specify objects that should be monitored for cache invalidation (via RELIES_ON clause), starting R2 Oracle took care of it directly (RELIES_ON clause is completely ignored).

Interestingly enough, Oracle will monitor not only direct dependencies! If inside of the function there are calls via Dynamic SQL to other objects - those will be monitored too. Last year I had the whole section on this topic at ODTUG KScope'12 ("Dynamic SQL in 11g World")! Here is a snippet from my white-paper:

Integration of Dynamic SQL and RESULT_CACHE

Another good thing about the current implementation of Dynamic SQL is that Oracle’s PL/SQL team actively integrates it with other advanced features. For example, the “result cache” introduced in 11gR1 (and significantly rewritten in 11gR2) is efficient enough to not only auto-detect hard-coded dependencies, but also recognize and record on-the-fly calls made via Dynamic SQL. The following example includes a function that can get current row counts for a given table:

create or replace function f_getCount_nr (i_tab_tx varchar2)
return number
result_cache
is
    v_sql_tx varchar2(256);
    v_out_nr number;
begin
    execute immediate 
      'select count(*) from '||i_tab_tx into v_out_nr;
    return v_out_nr;
end;

Step #1 would be to confirm that (a) result cache actually works and (b) it recognized on-the-fly dependency.

SQL> select f_getCount_nr('EMP') from dual;
F_GETCOUNT_NR('EMP')
--------------------
                  14

SQL> select ro.id,  ro.name,  do.object_name
  2  from   v$result_cache_objects    ro,
  3         v$result_cache_dependency rd,
  4         dba_objects               do
  5  where  ro.id = rd.result_id
  6  and    rd.object_no = do.object_id;

ID NAME                                                   OBJECT_NAME
-- -----------------------------------------------------  ---- 
1  "SCOTT"."F_GETCOUNT_NR"::8."F_GETCOUNT_NR"#8440... #1  EMP
1  "SCOTT"."F_GETCOUNT_NR"::8."F_GETCOUNT_NR"#8440... #1  F_GETCOUNT_NR

SQL>select f_getCount_nr('EMP') from dual;
F_GETCOUNT_NR('EMP')
--------------------
                  14

SQL> select *
  2  from v$result_cache_statistics
  3  where  name in ('Create Count Success','Find Count');
ID  NAME                 VALUE
--- -------------------- ------
5   Create Count Success 1
7   Find Count           1
SQL>

Oracle successfully recognized the EMP table as cache dependency and was able to return a value from the cache when the function was called a second time. Now to test cache invalidation, I will insert a new row to EMP table and re-fire the function F_GETCOUNT_NR

SQL> insert into emp(empno) values (100);
1 row created.
SQL> commit;
Commit complete.

SQL> select f_getCount_nr('EMP') from dual;
F_GETCOUNT_NR('EMP')
--------------------
                  15

SQL> select id, name, value
  2  from v$result_cache_statistics
  3  where  name in ('Create Count Success',
  4                   'Find Count','Invalidation Count');

ID  NAME                 VALUE
--- -------------------- ------
5   Create Count Success 2
7   Find Count           1
8   Invalidation Count   1
SQL>

This time, Oracle successfully detected data changes and invalidated the previously cached information. Now let’s introduce a new dynamic dependency  (to DEPT table) and see whether the resulting cache would successfully recognize the difference.

SQL> select f_getCount_nr('DEPT') from dual;
F_GETCOUNT_NR('DEPT')
---------------------
                    4
SQL> select id,  name,  object_name
  2  from   v$result_cache_objects    ro,
  3         v$result_cache_dependency rd,
  4         dba_objects               do
  5  where  ro.id = rd.result_id
  6  and    rd.object_no = do.object_id;

ID  NAME                                                            OBJECT_NAME
--- --------------------------------------------------------------- -------------
3   "SCOTT"."F_GETCOUNT_NR"::8."F_GETCOUNT_NR"#8440831613f0f5d3 #1  EMP
3   "SCOTT"."F_GETCOUNT_NR"::8."F_GETCOUNT_NR"#8440831613f0f5d3 #1  F_GETCOUNT_NR
4   "SCOTT"."F_GETCOUNT_NR"::8."F_GETCOUNT_NR"#8440831613f0f5d3 #1  DEPT
4   "SCOTT"."F_GETCOUNT_NR"::8."F_GETCOUNT_NR"#8440831613f0f5d3 #1  F_GETCOUNT_NR

SQL> select id, name, value
  2  from v$result_cache_statistics
  3  where  name in ('Create Count Success',
  4                   'Find Count','Invalidation Count');

ID  NAME                 VALUE
--- -------------------- ------
5   Create Count Success 3
7   Find Count           1
8   Invalidation Count   1
SQL>

As you can see, a variation of the resulting cache with the dependency on DEPT (rather than EMP) was immediately recognized. This means that Dynamic SQL is indeed fully integrated into the overall on-the-fly caching mechanism.

Friday, April 26, 2013

Database roles: BigData, BigUsers, Big...Problems?

Today at lunch we had an interesting discussion about the role of databases in the contemporary IT development - and there was a lot of buzz-words thrown across the table: Hadoop, NoSQL, BigData, BigUsers (reasonably small data+high number of concurrent users), ACID-compliance etc. To be fair, a bit too many buzz-words - I have a very bad feeling that even contemporary architects stopped understanding core concepts behind their work!

Let's start from the key point - we have databases to manage DATA. And one of the key elements of this tasks is to make sure that data is reliably stored and retrieved. And here is a catch - what do we mean by reliable? Or to be precise - what happens to you/your company/your customers if some piece of the data is lost forever/unrecoverable? And the answer on this question drives the whole technology stack! For example, if you work with medical/legal/official data - a small chunk of lost information (if noticed) could mean litigation at best and people's life at worst!

Let's be clear - majority of current NoSQL DB solutions are explicitly not ACID-compliant (or at least not 100% ACID compliant). For example, I found a pretty good analysis of MongoDB and CouchDB - and it is clear that even its proponents say that there are always trade-offs between performance and data reliability. In some articles there are even suggestions to have double-environment implementation, where you have NoSQL-database for non-critical data plus RDBMS for critical data.

Just to clarify - what do I mean by ACID-compliance:
  • Atomicity requires that each transaction is executed in its entirety, or fail without any change being applied.
    • I.e. if you have successful INSERT and successful DELETE in the same transaction - you will have both/none of them committed.
  • Consistency requires that the database only passes from a valid state to the next one, without intermediate points.
    • I.e. it is impossible to catch the database in the state when for the stored data some rules (for example, PK) are not yet enforced
  • Isolation requires that if transactions are executed concurrently, the result is equivalent to their serial execution. A transaction cannot see the partial result of the application of another one.
    • I.e. each transaction works in its own realm until it tries to commit the data.
  • Durability means that the the result of a committed transaction is permanent, even if the database crashes immediately or in the event of a power loss.
    • I.e. it is impossible to have a situation when the application/user thinks the data is committed but after the power failure it is gone. 
As we can see from that list, all of these listed requirements are technically very challenging to implement, especially with the high number of concurrent users and significant data volumes - that why Oracle went extreme with its UNDO/REDO/LOG mechanisms. But that's the price to pay for being sure that if you saved the data - it would NEVER disappear.

I understand that there are environments where that small chance of data loss can be if not ignored, but at least tolerated: we all know that Craiglist is being run by MongoDB - so, what's the impact by one lost add? Somebody might get annoyed, but that's all!

Although when I start hearing about medical systems being built via NoSQL solutions - I start to get nervous. Maybe, in a couple of years before going to the doctor I will first check what kind of software they use! Just to feel safer...

Wednesday, April 24, 2013

Performance tuning. Spending time is NOT OK (if you do not know exactly why)

Yet another performance tuning story, similar to one that happened about a month ago. Sad thing - something was was coded years ago we never questioned the time spent... Just to keep it short:
  • once upon a time there was a very time-consuming module.
  • eventually we were forced to take much closer look - WHY is it so time-consuming
  • we found in our own view a function call that was completely unnecessary.
  • this function was very light that initially we ignored it altogether, but later we realized that we've been calling it 80000 times - and in that case even the lightest cost adds up.
  • removing this function (all needed data was already available via joined tables) took the cost of a class from 40 seconds down to 3.
  • Profit! :-)
We knew that the module is extremely complicated and that a lot of database operations was involved - so we assumed that it is absolutely fine! Of course, we've hit a case when assumptions are the worst enemies of a real knowledge... Especially if we are talking about performance tuning - because unless we've proven where exactly we are losing N seconds in the module X, we cannot say that module X is cannot take less than N seconds. Because those N seconds could be related to the different module/outdated coding technique/structural change etc. And only knowing what exactly is going on we can make a decision whether it is OK or not.

Summary: A couple of lessons learned:
  1. If something does not perform well in the existing system - check it (unless it is something known and documented)! You may think that that time loss is inevitable, but who knows? 
  2. Function calls in SELECT statements could cause a lot of issues even if each call is light - you must think not only about the cost of a function call, but about how many times this function will be fired
By the way, at the recent IOUG Collab'13 in the presentation "Top 5 Issues that Cannot be Resolved by DBAs (other than missed bind variables)" I covered the second topic (number of calls) a bit deeper. Below is a snippet from my white-paper:

Number of calls in SELECT clause

There are multiple ways of ensuring that if a function is referenced in the SELECT clause, it is not fired more often than needed. Unfortunately, few developers are even aware of this problem. My recommendation is to include the following set of examples in any PL/SQL class. This explicitly illustrates the difference between real understanding and guessing.

First, set up a basic environment to count total number of calls: a package variable to store the counter, a simple function, and a checker to display/reset the counter:

create package misha_pkg is
    v_nr number:=0;   
end;

create or replace function f_change_tx (i_tx varchar2)
return varchar2 is
begin
    misha_pkg.v_nr:=misha_pkg.v_nr+1;   
    return lower(i_tx);
end;

Create or replace procedure p_check is
begin
     dbms_output.put_line('Fired:'||misha_pkg.v_nr);
     misha_pkg.v_nr:=0;
end;

Second, run a very simple query against table EMP,  where the function above will be applied against EMP.JOB. And let us keep in mind that there are 14 total rows in the table EMP:

SQL> select empno, ename, f_change_tx(job) job_change_tx
  2  from emp;
   ...
14 rows selected.
SQL> exec p_check
Fired:14
PL/SQL procedure successfully completed.

If you just use the function, it will be fired for every row. But we know that there are only 5 distinct JOB values, so we should try to decrease the number of calls. In Oracle 11gR2, there is a very interesting internal operation called “scalar sub-query caching” being used while processing SQL queries. It allows Oracle to internally reuse previously calculated results on SELECT statements if they are called multiple times in the same query. The following example tests to see if using this operation helps:

SQL> select empno, ename, (select f_change_tx(job) from dual)
  2  from emp;
  ...
14 rows selected.
SQL> exec p_check
Fired:5
PL/SQL procedure successfully completed.
SQL>

The result shows that it did help. Now, only five distinct calls are registered, which isexactly as needed. Although, since we are discussing cache, why not use it explicitly? There is another very powerful feature called “PL/SQL function result cache.” The following example enables it on the function while the same query is run two times:

create or replace function f_change_tx (i_tx varchar2)
return varchar2 result_cache is
begin
    misha_pkg.v_nr:=misha_pkg.v_nr+1;   
    return lower(i_tx);
end;

SQL> select empno, ename, f_change_tx(job) from emp;
...
14 rows selected.
SQL> exec p_check
Fired:5
SQL> select empno, ename, f_change_tx(job) from emp;
...
14 rows selected.
SQL> exec p_check
Fired:0

The result is impressive! If the first call matches the sub-query caching, the second call is a fantastic example of great performance tuning – everything works as needed, but nothing is being done (actually, this is not 100% true, since the cache should be retrieved anyway, but for practical purposes it is a very simple PK lookup).

Tuesday, April 23, 2013

Collaborate'13 Summary. Part 4. Database development


This post is the final part of my thoughts about IOUG Collaborate'13 conference (part onepart twopart three). If I will not write it now I will forget what was going on :-), so the last push!

4. Real stuff

In all of the previous posts I was covering either topics of more abstract nature (like leadership) or topics that would satisfy my curiosity (MySQL, BigData, NoSQL etc). But majority of people go to conference to learn something that will be directly applicable to their daily life. And of course I went to a number of talks that cover my main topic - database development.

Overall, this conference from the purely database side was a bit strange - everybody knew that Oracle 12c is upcoming, but nobody new when. And even worse, Oracle employees couldn't even call the name "12c" (because their marketing department didn't FINALLY approve it)! Funny part - in the conference hall there was a big book-store with enormous Oracle Press sign, stating that "Oracle 12c Complete Reference" is scheduled for June 2013. As you can see, strange world...

The most talked about feature of the Oracle "we-don't-know-what" this year happened to be the notion of "pluggable databases" - there is already a ton of materials on this topic, but conceptually you can think of it as an extended version of multiple instances on the same physical box with a separate "controller" instance that manages all resources on the top and can access all of them. It seems that this way Oracle indeed can significantly decrease hardware footprint - at least all people with whom I've been talking agree that the feature is very powerful.

For sure there was a number of great tasks about different database features. For example, professor Carl Dudley did a great job finally explaining to me why I PERSONALLY care about ANSI SQL in Oracle - as it appears to be, using that syntax you can avoid the following error - "ORA-01719: outer join operator (+) not allowed in operand of OR or IN". I think, later I will do the whole post about on this topic. Also, it was pretty fun to listen to Maxym Kharchenko talking about different approaches to pagination. A bit of a surprise at that presentation was the response of the audience when I wondered about consistent data reads while doing paging - looks like NOBODY cared about it. Hm-m-m-m-m, definitely a post needed....

Another big topic (including my own presentation) was the lack of communication between DBAs and Developers - as a result, there is a major disconnect between people who know business side of database processes and people who know internals. And from the either side of the isle the total picture is blurred. I am glad that finally the whole community starts to understand the problem - all of those DBA/Developer talks had a great turnout, maybe some of them even later will become webcasts by IOUG. I am really looking forward to see "my dream":

  • Architects get DBAs involved in the system from the very beginning
  • Developers talk DBAs about possible impacts of different features/options
  • DBAs actively teach developers about available features and better ways of writing code
I understand that it's a hard sell - but why can't I have a dream? People, let's keep dreaming - and until the next conference!

Oracle Internals. Digging out PL/SQL fine-grain dependencies

After a couple of requests at IOUG Collab'13 I decided to resurrect a pretty old topic that I've participated in 2008 (since in 5 years there was no changes on that front :-) ).

As we all know, starting version 11g Oracle tracks PL/SQL dependencies in much more granular way - instead of treating PL/SQL package as a single entity, now each separate function/procedure/variable has its own chain of references. As a result, let's say, if you add a new procedure - there is no invalidation! The same pattern was also applied to table references - adding of a new column also didn't force invalidation.

Unfortunately, Oracle did not update *_DEPENDENCIES lookups - this information is invisible to common users. Fortunately, there are always brave people who look at Oracle internals in the search for higher wisdom! My esteemed colleagues Rob Van Wijk and Toon Koppelaars did some crazy inventive research and came up with the solution for columns. Of course, it was not perfect, but their DBA_DEPENDENCY_COLUMNS view help me a lot in a couple of our recent projects. Up to the point that now I set it up by default in all of development databases.

I also decided that if those guys can do such tricks, it may be possible to do something similar for PL/SQL. And indeed I was able to come up with my own DBA_DEPENDENCY_ARGS view! Even with two variations of it - using different source:
  1. SYS.plscope_identifier$+SYS.plscope_action$ (PL/Scope generated information+its lookup)
  2. SYS.procedureinfo$ (real PL/SQL source)
As I found later, SYS$procedureInfo$ was not reliable. It worked fine only if package specs contained only procedure/function declarations, while any package-level type declarations or global variables completely messed up the count of lines. So, yes, we are in PL/Scope land (by the way, great tool if you are not aware!).

And here is the beast (please, check Rob's post about WHY it works :-) ). it has to be run from SYS and it will create a view, a synonym and public grant.

create view dba_dependency_args as
select d.u_name   owner
       , d.o_name  name
       , decode
         ( d.o_type#
         , 0, 'NEXT OBJECT', 1, 'INDEX', 2, 'TABLE', 3, 'CLUSTER'
         , 4, 'VIEW', 5, 'SYNONYM', 6, 'SEQUENCE', 7, 'PROCEDURE'
         , 8, 'FUNCTION', 9, 'PACKAGE', 10, 'NON-EXISTENT'
         , 11, 'PACKAGE BODY', 12, 'TRIGGER'
         , 13, 'TYPE', 14, 'TYPE BODY', 22, 'LIBRARY'
         , 28, 'JAVA SOURCE', 29, 'JAVA CLASS'
         , 32, 'INDEXTYPE', 33, 'OPERATOR'
         , 42, 'MATERIALIZED VIEW', 43, 'DIMENSION'
         , 46, 'RULE SET', 55, 'XML SCHEMA', 56, 'JAVA DATA'
         , 59, 'RULE', 62, 'EVALUATION CONTXT'
         , 92, 'CUBE DIMENSION', 93, 'CUBE'
         , 94, 'MEASURE FOLDER', 95, 'CUBE BUILD PROCESS'
         , 'UNDEFINED'
         ) type
       , nvl2( d.po_linkname, d.po_remoteowner, d.pu_name) 
              referenced_owner
       , d.po_name referenced_name
       , decode
         ( d.po_type#
         , 0, 'NEXT OBJECT', 1, 'INDEX', 2, 'TABLE', 3, 'CLUSTER'
         , 4, 'VIEW', 5, 'SYNONYM', 6, 'SEQUENCE', 7, 'PROCEDURE'
         , 8, 'FUNCTION', 9, 'PACKAGE', 10, 'NON-EXISTENT'
         , 11, 'PACKAGE BODY', 12, 'TRIGGER'
         , 13, 'TYPE', 14, 'TYPE BODY', 22, 'LIBRARY'
         , 28, 'JAVA SOURCE', 29, 'JAVA CLASS'
         , 32, 'INDEXTYPE', 33, 'OPERATOR'
         , 42, 'MATERIALIZED VIEW', 43, 'DIMENSION'
         , 46, 'RULE SET', 55, 'XML SCHEMA', 56, 'JAVA DATA'
         , 59, 'RULE', 62, 'EVALUATION CONTXT'
         , 92, 'CUBE DIMENSION', 93, 'CUBE'
         , 94, 'MEASURE FOLDER', 95, 'CUBE BUILD PROCESS'
         , 'UNDEFINED'
         ) referenced_type
       , d.po_linkname referenced_link_name
       , c.name referenced_arg
       , decode(bitand(d.d_property, 3), 2, 'REF', 'HARD') 
          dependency_type       
       ,d.obj#,
       d.colpos
    from ( select obj#
                , u_name
                , o_name
                , o_type#
                , pu_name
                , po_name
                , po_type#
                , po_remoteowner
                , po_linkname
                , d_property
                , colpos                
             from sys."_CURRENT_EDITION_OBJ" o
                , sys.disk_and_fixed_objects po
                , sys.dependency$ d
                , sys.user$ u
                , sys.user$ pu
            where o.obj# = d.d_obj#
              and o.owner# = u.user#
              and po.obj# = d.p_obj#
              and po.owner# = pu.user#
              and d.d_attrs is not null
            model
                  return updated rows
                  partition by
                  ( po.obj#        obj#
                  , u.name         u_name
                  , o.name         o_name
                  , o.type#        o_type#
                  , po.linkname    po_linkname
                  , pu.name        pu_name
                  , po.remoteowner po_remoteowner
                  , po.name        po_name
                  , po.type#       po_type#
                  , d.property     d_property
                  )
                  dimension by (0 i)
                  measures (0 colpos, substr(d.d_attrs,9) attrs)
                  rules iterate (1000)
                        until (iteration_number = 
                                 4 * length(attrs[0]) - 2)
                  ( colpos[iteration_number+1]
                    = case bitand
                           ( to_number
                             ( substr
                               ( attrs[0]
                               , 1 + 2*
                                  trunc((iteration_number+1)/8)
                               , 2
                               )
                             ,'XX'
                             )
                           , power(2,mod(iteration_number+1,8))
                           )
                      when 0 then null
                      else iteration_number+1
                      end
                  )
         ) d
       , ( select i.obj#, 
                  i.symrep name,
                  row_number() over(partition by i.obj# 
            order by a.action#) procedure#
              from sys.plscope_identifier$ i,
                   sys.plscope_action$ a
              where i.signature = a.signature
              and a.action=1 -- declaration
              and context#=1 -- root elements (for now)       
          ) c
   where d.obj# = c.obj#
     and d.colpos = c.procedure#
/
create public synonym dba_dependency_args for sys.dba_dependency_args
/
grant select on dba_dependency_args to public
/

And here is a proof that this "monstrosity" does at least something useful (please do not forget to enable PL/Scope before compiling anything: ALTER SESSION SET plscope_settings= 'IDENTIFIERS:ALL' )! First, I created package #1 that contained all reference objects:


create or replace package misha_p1
is   
  type rec_t is record (a number, b varchar2(10));
  type rec_tt is table of rec_t;
  
  v_rec_tt rec_tt;

  procedure p1a;  

  v_misha_p1_v1_nr number;
  v_misha_p1_v2_nr constant number:=1;

  procedure p1b(a number:=null, b number:=null, c number:=null);  

end;
/

Second,  I created package #2 that had the following list of referenced to the package #1:
  •  package spec:
    • variable V_MISHA_P2_V1_NR references a variable from P1 as a default value
    • procedure P2B has a variable from P1 as a default parameter
  • package body
    • procedure P2A has a variable of a type defined in P1
    • procedure P2A changes global variable defined in P1
    • procedure P2B calls a procedure from P1
    • procedure P2B has a variable from P1 as a default parameter
create or replace package misha_p2
is
  procedure p2a;
  v_misha_p2_v1_nr number:=misha_p1.v_misha_p1_v2_nr;
  procedure p2b (in_p2b_nr number:=misha_p1.v_misha_p1_v1_nr);
end;
/
create or replace package body misha_p2
is
  -- this is a remark
  procedure p2a
  is
    v_tt misha_p1.rec_tt;
  begin
    misha_p1.v_misha_p1_v1_nr:=1;   
  end;
  
  procedure p2b(in_p2b_nr number:=misha_p1.v_misha_p1_v1_nr) is 
  begin  
    -- this is also a remark
    misha_p1.p1a;
  end;
end;
/

And the most interesting part - let's check whether we've got all 6 dependencies correctly:

SQL> select name,type,referenced_name,referenced_type,referenced_arg
  2  from dba_dependency_args
  3  where owner = user;

NAME       TYPE            REFERENCED REFERENCED REFERENCED_ARG
---------- --------------- ---------- ---------- ----------------
MISHA_P2   PACKAGE BODY    MISHA_P1   PACKAGE    REC_TT
MISHA_P2   PACKAGE BODY    MISHA_P1   PACKAGE    P1A
MISHA_P2   PACKAGE BODY    MISHA_P1   PACKAGE    V_MISHA_P1_V1_NR
MISHA_P2   PACKAGE BODY    MISHA_P1   PACKAGE    V_MISHA_P1_V2_NR
MISHA_P2   PACKAGE         MISHA_P1   PACKAGE    V_MISHA_P1_V1_NR
MISHA_P2   PACKAGE         MISHA_P1   PACKAGE    V_MISHA_P1_V2_NR

6 rows selected.

SQL>

Hmm... I've got everything I was looking for... Strange - but good (for a change :-) ).

Summary: we are in the undocumented land, people, but if it can make your life easier - go for it! I will be glad to hear whether this approach helped anybody - please, post you comments here.

P.s. One of my favorite articles about PL/Scope by Steven Feuerstein -  a lot of useful queries

Monday, April 22, 2013

Collaborate'13 Summary. Part 3. BigData confusion

This post is a continuation of my thoughts about IOUG Collaborate'13 conference (part onepart two). As I've been mentioning in the preview, the whole question of BigData was on my short-list. First of all, I am still trying to figure out what are we talking about - but since there was a number of talks about this subject, I hoped that I can hear some words of wisdom from my esteemed colleagues.

3. BigData

If you wonder, whether I came back from the conference with definitive answers - sorry, but no. BigData community is still is a very amorphous structure with a lot of ideas/tools/concepts floating around. Unfortunately, some software vendors (let's not name them, please) are also trying to fish in that muddy waters, and as a result some presentations became about 90% of marketing / 10% of content. As a result it also complicated my attempts to get "the big picture". Although, there are some common ideas floating around:
  • people should not mix BigData and NoSQL, but they do! And that's one main confusion areas:
    • BigData (Hadoop+MapReduce) is an extension to good-old data mining, just you can mine much more data much faster on much cheaper hardware. It makes the whole life easier because you can store data first in any way/shape/form it comes and try to make use of out it later - there is no need to define a structure beforehand.
    • NoSQL is the environment where you can WORK with that (non)(semi)-structured data very efficiently from the very beginning. Minor problem - each existing solution is optimized for a pretty focused set of tasks. Yes, it is well-optimized, but you need to think hard to select right tools for the right problem.
  • HADOOP is huge as a storage mechanism. Minor problem - as far as I understood, anything below 50-node cluster is considered a toy-box. So you need to have a real issue to be solved by it, because starting budgets are in $300k+
  • NoSQL usually comes in conjunction with regular RDBMS and rarely a standalone, especially if you care about high reliability/control/audit etc. It is considered as a side performance booster - but it also can mask MAJOR problems in the RDBMS. I've heard a number of anecdotes, when IT stuff was introducing NoSQL for performance reasons, but at some point they've hired Oracle performance experts - and they scrapped NoSQL solution, because Oracle RDBMS started to crunch data as fast as it was needed. So, if you expect that your bad coders will do good code just because of the technology change - you may not be 100% right.
  • Toolsets are in the disarray - and everybody got his own preferences. There is a lo-o-o-o-ng way for this environment to mature.
Summary:  data growth by itself and the raw speed of data growth make an enormous market for solutions that could handle them. Currently this market is "up for grabs", because major RDBMS environments just think differently and have different priorities, while smaller players are linked to their niche audience. So, we (IT specialists) need to keep our eyes opened - at some point "there can be only one"(c)!

Friday, April 19, 2013

Oracle Dynamic SQL: generic search - crazy case

For the Friday evening I decided to dig out my example from the most recent of my published books (Expert PL/SQL Practices, APress, 2011) - it was a very fun project to work with: first, to come with with the idea of such multi-author book, second, to write a chapter focused on Dynamic SQL.

As one of the examples I decided to illustrate how XMLType can be used as both a collection of data elements and a collection of structural ones. Here is a function that lets any two parameters to be passed as filters (key point - ANY parameter, not only predefined ones!) - it is slightly extended variation of what was in print.

create or replace FUNCTION f_searchXML_ref 
   (i_param_xml XMLTYPE:= 
      XMLTYPE( 
        '<param '||
        'col1_tx="DEPTNO" value1_tx="20" type1_tx="NUMBER" '||
        'col2_tx="JOB" value2_tx="CLERK" type2_tx="VARCHAR2"/>'
         )
    )
RETURN SYS_REFCURSOR
IS
    v_out_ref SYS_REFCURSOR;
    v_sql_tx VARCHAR2(32767);
BEGIN
  SELECT              
   'WITH param AS  ('||chr(10)||
   ' SELECT '||chr(10)||
   case EXTRACTVALUE (i_param_xml, '/param/@type1_tx') 
        when 'NUMBER' then 'TO_NUMBER('
        when 'DATE' then 'TO_DATE('
        else null
   end||
   'EXTRACTVALUE (in_xml, ''/param/@value1_tx'')'||
   case EXTRACTVALUE (i_param_xml, '/param/@type1_tx') 
        when 'NUMBER' then ')'
        when 'DATE' then ',''YYYYMMDD'')'
        else null
   end||' value1, '||chr(10)||    
   case EXTRACTVALUE (i_param_xml, '/param/@type2_tx') 
        when 'NUMBER' then 'TO_NUMBER('
        when 'DATE' then 'TO_DATE('
        else null
   end||
   'EXTRACTVALUE (in_xml, ''/param/@value2_tx'')'||
   case EXTRACTVALUE (i_param_xml, '/param/@type2_tx') 
        when 'NUMBER' then ')'
        when 'DATE' then ',''YYYYMMDD'')'
        else null
   end||' value2 '||chr(10)||
   ' FROM (SELECT :1 in_xml FROM DUAL) '||chr(10)||
   ' ) '||chr(10)||
   ' SELECT empno'||chr(10)||
   ' FROM scott.emp, '||chr(10)||
   '            param '||chr(10)||
   ' WHERE 1=1 '||chr(10)||
   case when EXTRACTVALUE (i_param_xml, '/param/@value1_tx') 
     is not null 
   then
       'and emp.'|| dbms_assert.simple_sql_name(
                    EXTRACTVALUE (i_param_xml, '/param/@col1_tx')
                                  )||'=param.value1 '||chr(10)
       else null
   end||
   case when EXTRACTVALUE (i_param_xml, '/param/@value2_tx') 
       is not null 
   then                                  
       'and emp.'|| dbms_assert.simple_sql_name(
                    EXTRACTVALUE (i_param_xml, '/param/@col2_tx')
                                 )||'=param.value2'
       else null
   end
   INTO v_sql_tx  FROM DUAL;
    
   dbms_output.put_line(v_sql_tx);    

   OPEN v_out_ref FOR v_sql_tx USING i_param_xml;

   RETURN v_out_ref;
END;
/


As you see, the incoming XMLType contains a lot: column names (DEPTNO and ENAME), values (20 and KING). and datatypes (NUMBER and VARCHAR2). These parameters are treated in different ways:

  • columns are extracted when I am building SELECT statement. Special call to DBMS_ASSERT.SIMPLE_SQL_NAME makes sure that this we are "code injection"-proof
  • since by default XML contains text, if we want to properly identify data-types we need to explicitly add TO_NUMBER or TO_DATE calls (for simplicity I assume that all dates will be in YYYYMMDD-format)
  • data is passed as a single input variable into OPEN...FOR... This way we do not need to worry if some parameters are completely missing.
And usage is very simple - the same as in the previous case. It will find EMPNO of all employees from the department 20 that are clerks:


declare
    v_ref SYS_REFCURSOR;
    v_tt id_tt;
begin
    v_ref:=f_searchXML_ref();
   
    fetch v_ref bulk collect into v_tt;
    close v_ref;   
    dbms_output.put_line('Fetched:'||v_tt.count);
end;


If you want just department 20 - also no problem:

declare
    v_ref SYS_REFCURSOR;
    v_tt id_tt;
begin
    v_ref:=f_searchXML_ref(
            XMLTYPE( 
               '<param '||
               'col1_tx="DEPTNO" value1_tx="20" type1_tx="NUMBER"/>'
                 )
              );
   
    fetch v_ref bulk collect into v_tt;
    close v_ref;   
    dbms_output.put_line('Fetched:'||v_tt.count);
end;

Yes, I understand that this syntax is somewhat strange - but it is still worth to know! And on that note - have a good weekend, everybody!

P.s. Decided to change blog template - the original one was too narrow, and as a result code samples looked strange.


Thursday, April 18, 2013

Collaborate'13 Summary. Part 2. MySQL.. New David?

After making a number of purely technical posts I decided today to switch back and continue describing Collab'13. The first post was about the leadership, which was a somewhat untrivial topic for what was normally known as heavily technical conference. Now it's time to get more technical!

2. MySQL

As you might noticed from my original conference schedule, I've been really interested in learning a lot of completely new stuff. Good news - I've got a lot of things to review, bad news - some answers were not what I was expecting and now I am struggling to fit them in the picture of a sane universe.

Let me start from MySQL. A lot of thanks to George Trujillo and Dave Stokes for providing tons of really deep insights on the role of this database in the contemporary IT environment. It was a big surprise for me to realize how much influence MySQL got in the corporate world - for example, as far as I understood, if you use Expedia.com, the search is done via MySQL while ordering is done via Oracle.

It seems that this dual implementation slowly but steadily becomes "a new norm" - there is one major database, implemented using high-end solution + there are other side projects done via MySQL. Reasons are usually split into two groups:
  • money: even big corporations think a lot about costs of Oracle licences. As a result I've heard from a number of people the following corporate policy: "We do it in MySQL, unless you can prove that it cannot be done there"
  • read optimization: let's keep the picture clear - the main purpose of MySQL from the very beginning was to  quickly bring the data up. Everything else is "nice to have", the MySQL development team stays very focused on that vision (I've talked with them!)
  • flexibility: another big surprise for me was that MySQL is really a container of different possible storage engines. And there are dozens of them - each tunes in its own way. Yes, InnoDB is the closest to "real" RDBMS with ACID compliance, memory processes etc - but in a lot of cases you just don't care about all that overhead! Of course, you pay the price for it - from the questions of the audience I understood that a significant percentage of performance issues is usually related with the selection of a wrong storage engine. Sorry, that's what a reverse side of "flexibility" coin...

Summary: yes, MySQL has its limitations and issues - and it is not designed to replace Oracle RDBMS, but for smaller/side projects it becomes a solid alternative. As a result for anybody involved with Oracle databases sooner than later it will become a necessity to have a good idea about what MySQL is all about. Back to books, ladies and gentlemen!

P.s. I am aware that Facebook runs on MySQL - and that MySQL can technically support big implementations. But in "the real corporate world" there are terabytes of data managed by RDBMS - and thousands of specialists trained to do it well. For them MySQL can augment their tech stack, but cannot replace it.

Oracle Dynamic SQL: generic search - REF CURSOR

This post is a continuation of a topic, I've raised previously - dynamic implementation of generic searches. As  I mentioned in that post, there are circumstances, when it is much more convenient to return a pointer to a row-set instead of that row-set. Or in terms of Oracle - REF CURSOR instead of a collection.

Here is a variation of my search procedure that uses REF CURSOR:

create or replace function f_search_ref
    (i_limit_nr number:=null,
     i_param_empno_nr   number:=null,
     i_param_ename_tx varchar2:=null,
     i_param_job_tx varchar2:=null)
return SYS_REFCURSOR
is
    v_sql_tx varchar2(32767);
    v_ref sys_refcursor;
begin
    -- opening
    v_sql_tx:=
     'declare '||chr(10)||
     '  lv_count_nr constant number:=:1;'||chr(10)||        
     '  lv_empno_nr constant number:=:2;'||chr(10)||
     '  lv_ename_tx constant varchar2(50):=:3;'||chr(10)||
     '  lv_job_tx  constant varchar2(50):=:4;'||chr(10)||
     'begin '||chr(10)||
    'open :5 for select empno from emp '||chr(10)||
        ' where rownum <=lv_count_nr ';

    -- i_param_empno
    if i_param_empno_nr is not null then
        v_sql_tx:=v_sql_tx||chr(10)||
                 ' and empno = lv_empno_nr ';
    end if;

    if i_param_ename_tx is not null then
        v_sql_tx:=v_sql_tx||chr(10)||
                 ' and ename like ''%''||lv_ename_tx||''%'' ';
    end if;

    if i_param_job_tx is not null then
        v_sql_tx:=v_sql_tx||chr(10)||
                 ' and job = lv_job_tx ';
    end if;

    -- closing
    v_sql_tx:=v_sql_tx||';'||chr(10)||
              'end;';

    dbms_output.put_line(v_sql_tx);

    execute immediate v_sql_tx
        using nvl(i_limit_nr,50),
              i_param_empno_nr,
              i_param_ename_tx,
              i_param_job_tx,
              v_ref;
              
    return v_ref;
end;

Key changes are highlighted:
  • Function should return SYS_REFCURSOR - it's a "weak" REF CURSOR that can point to any type of a row-set, exactly as needed
  • To open that REF CURSOR I am using "OPEN...FOR..." statement
  • Please, notice that even we "think" that REF CURSOR is an output of out Dynamic SQL, for Oracle it is still IN-parameter.
And here is an example of how that REF CURSOR can be used.

declare
    v_ref SYS_REFCURSOR;
    v_tt id_tt;
begin
    v_ref:=f_search_ref(10,null,null,null);
   
    fetch v_ref bulk collect into v_tt;
    close v_ref;
   
    dbms_output.put_line('Fetched:'||v_tt.count);
end;

The reason I've included it is clear - PLEASE, do not forget to close REF CURSORS when you finished using it! Resources are limited, so let's not waste it unnecessarily!

Wednesday, April 17, 2013

Oracle Dynamic SQL: generic search - simple case

It is a very common problem - how do you implement generic search over the table? By generic search I mean that users are provided with tons of different options that could be used in all possible permutations.

It leads to a standard problem - how do you make sure that all of those searches are reasonably optimized. There are multiple schools of thoughts about it - a lot of contemporary database experts claim that CBO became so good that it can figure out everything. To be fair, I belong to the other group of people who are a bit more skeptical and suggest that we should help Oracle at least somewhat.

Of course, there is always DBMS_SQL package that can handle anything you can imaging, but a lot of developers (especially envisioning future maintenance) prefer to stay with Native Dynamic SQL. It is definitely understandable, and sometimes ago I came up with the trick to generic search without DBMS_SQL.

In the example below I wrote a function that would do a search on SCOTT.EMP table:
  • three possible conditions (yes, I understand that a number of conditions may be unknown - I have a bit crazy example with passing XMLType as a list of parameters, but that is a bit more advanced. Let's start with the most direct case):
    • EMPNO - direct ID check
    • JOB - direct match
    • ENAME - like condition
  • Return limit is defaulted to 50, but can be overwritten
    • Please, keep in mind that it is just a sample - if you need to do real pagination, I would rather recommend using the same approach, but return REF CURSOR as output. I will show this approach also as a separate post.
    • Of course, keep in mind that loading thousands of objects into the memory may of may not be a good thing :-)
  • Output with the list of primary keys
    • Because I wanted to have that function usable in both SQL and PL/SQL, I decided to implement this output as a collection of numbers.
From the conceptual standpoint, the idea is very simple (but a bit strange :-) ):
  • As the initial stage, I am trying generate not SQL, but PL/SQL block where ALL possible parameters become local constants. Each constant gets default value that will be passed in as a bind variable.
  • At the same time I (in this example) am building SELECT and FROM clauses of the query. Sometimes this step also has to be conditional, because depending on parameters you may or may not need to do extra joins. To simplify the case for now we have a single-table search, so there is no need to do any checks here.
  • Also I initialize WHERE clause (so later I can just do AND without worrying about the syntax) with mandatory condition.
  • The second step involves spinning through all passed parameters and building extra conditions as needed. Here is where I have a main trick - instead of referencing real bind variables I reference my local variables that were created at the initial stage.
  • The last step is to close PL/SQL block and fire EXECUTE IMMEDIATE. Please, notice, that I am passing ALL parameters - but in the SQL statement only SOME local variables will be used.
Overall described approach has a number of merits to be considered ;-) : 
  1. - you don’t have to learn DBMS_SQL  :-)  
  2. - you still will be using bind variables with all possible permutations
  3. - you can directly see what you are trying to execute (very important for complex queries!)
create type id_tt is table of number;
/
-- search function
create or replace function f_search_tt 
    (i_limit_nr number:=null,
     i_param_empno_nr   number:=null,
     i_param_ename_tx varchar2:=null,
     i_param_job_tx varchar2:=null)
return id_tt
is
    v_sql_tx varchar2(32767);
    v_out_tt id_tt;
begin
    -- opening
    v_sql_tx:=
        'declare '||chr(10)||
        '  lv_count_nr constant number:=:1;'||chr(10)||        
        '  lv_empno_nr constant number:=:2;'||chr(10)||
        '  lv_ename_tx constant varchar2(50):=:3;'||chr(10)||
        '  lv_job_tx  constant varchar2(50):=:4;'||chr(10)||
        'begin '||chr(10)||
        ' select empno bulk collect into :5 from emp '||chr(10)||
        ' where rownum <=lv_count_nr ';

    -- i_param_empno
    if i_param_empno_nr is not null then
        v_sql_tx:=v_sql_tx||chr(10)||
                 ' and empno = lv_empno_nr ';
    end if;

    if i_param_ename_tx is not null then
        v_sql_tx:=v_sql_tx||chr(10)||
                 ' and ename like ''%''||lv_ename_tx||''%'' ';
    end if;

    if i_param_job_tx is not null then
        v_sql_tx:=v_sql_tx||chr(10)||
                 ' and job = lv_job_tx ';
    end if;

    -- closing
    v_sql_tx:=v_sql_tx||';'||chr(10)||
              'end;';

    dbms_output.put_line(v_sql_tx);

    execute immediate v_sql_tx
        using nvl(i_limit_nr,50),
              i_param_empno_nr,
              i_param_ename_tx,
              i_param_job_tx,
              out v_out_tt;

    dbms_output.put_line('Total rows found:'||v_out_tt.count);

    return v_out_tt;
end;
/

And now I can use this function in whatever pattern I want - exactly as specified!

-- get first 10 rows

select * from table(f_search_tt(10,null,null,null))
-- check ID
select * from table(f_search_tt(null,7566,null,null))
-- get only name search
select * from table(f_search_tt(null,null,'A',null))
-- get only job search
select * from table(f_search_tt(null,null,null,'MANAGER'))
-- get two conditions together
select * from table(f_search_tt(null,null,'A','MANAGER'))

Hope, it helps! And thanks a lot to my Collab'13 friends who constantly keep me thinking about better ways of solving already well-known problems!