Disclaimer
All copyrights, trademarks and service marks are hereby acknowleged. Any third-party trademarks are hereby acknowledged. All comments and opinions expressed by me are my own and not those of my employers. All comments and opinions expressed by visitors/members are their own. This site has no association with anyone and any advice, comments on products or support policies or tips and techniques are not endorsed by any corporation or my employer.
PL/SQL
This is a collection of basic information about Using Oracle PL/SQL. The best way to learn PL/SQL is to either read the Oracle Documentation or buy(and read) books on the subject. ‘Practise’ has not stopped making a man perfect.
Basic Structure of PL/SQL
PL/SQL stands for “Procedural Language extension to the SQL”. SQL is the most common and most popular way of querying of relational databases. It extends SQL by adding constructs found in procedural languages, resulting in a structural language that is more powerful than SQL. The basic unit in PL/SQL is a block. All PL/SQL programs are made up of blocks, which can be nested within each other. Each block performs a logical action in he program.
The following are a few defining characteristics of the language.
- Structured, Readable
- High Performance, Highly Integrated
- Embeded (Inside Oracle Environment.)
- Accessible, Standard, and Portable (for oracle development)
- One that I really like is “It’s Easy”
A PL/SQL block of code has the following features:
DECLARE
/* Declarative section: variables, types, and local subprograms. */
BEGIN
/* Executable section: procedural and SQL statements go here. */
/* This is the only section of the block that is required. */
EXCEPTION
/* Exception handling section: error handling statements go here. */
END;
TO BE CONTINUED…
Informatica
Here are my suggestions for someone who wants to start using and learning about Informatica Products.
Join http://my.informatica.com
Here, you will find all the documentation you need on all Informatica products. They also have a section called Knowledge Base where they log recommended solutions. You can browse through these solutions to find out how Informatica goes on about solving client’s problems.
Join http://tech.groups.yahoo.com/group/informaticadevelopment/
This group is recommended by Informatica as one of the primary sources of reference and help on Informatica. This is a yahoo group where many people work together to provide most logical and ideal solutions to the questions posted. I have also found it very helpful to read through previous messages to learn about different ways of solving an issue or probem.
Other than that, just start reading the manuals and start playing with the tools.
Detect Duplicates in a Table (Oracle)
This question is asked more times than one would imagine.
Question: How to detect duplicates in a table?
Answer:
There are multiple ways of doing things. Here are some of them.
Use of group by
select keys, count(*) from mytable
group by keys
having count(*) > 1
For Example:
SQL> select * from t1;
A B
---------- ----------
1 1
1 2
1 1
1 3
1 1
SQL> select a,b,count(*) from t1
2 group by a,b
3 having count(*)>1
4 /
A B COUNT(*)
---------- ---------- ----------
1 1 3
Use of ROWID
One way is to join the table to itself if it is a small table.
select * from t1 a, t1 b
where a.key = b.key and a.rowid != b.rowid
For Example:
SQL> create table t1 (a number, b number);
Table created.
SQL> insert into t1 values (1,1);
1 row created.
SQL> insert into t1 values (1,2);
1 row created.
SQL> insert into t1 values (1,1);
1 row created.
SQL> insert into t1 values (1,3);
1 row created.
SQL> commit;
Commit complete.
SQL> select * from t1;
A B
---------- ----------
1 1
1 2
1 1
1 3
SQL> select * from t1 a, t1 b
2 where a.a = b.a and a.b=b.b and a.rowid != b.rowid
3 /
A B A B
---------- ---------- ---------- ----------
1 1 1 1
1 1 1 1
This query however shows same record twice, i.e. one from “t1 a” and the other from “t1 b”.
The other way of doing it is by doing a sub-select on the same table. This query can be used in two ways. One way is to find all the rows that arrived into the table after the first one. The other way is to find all the rows that arrived into the table before the last one. The key is to switch between < and > signs.
select x.rowid, x.
from t1 x
where x.rowid > any (
select y.rowid
from t1 y
where x. = y.
)
For Example:
SQL> ed
Wrote file afiedt.buf
1 select x.rowid, x.a,x.b
2 from t1 x
3 where x.rowid>any (select y.rowid
5 from t1 y
6 where x.a = y.a and
7 x.b = y.b
8* )
SQL> /
ROWID A B
------------------ ---------- ----------
AAAGVNAB2AAAAQcAAA 1 1
SQL> insert into t1 values (1,1);
1 row created.
SQL> select x.rowid, x.a,x.b
2 from t1 x
3 where x.rowid>any (select y.rowid
5 from t1 y
6 where x.a = y.a and
7 x.b = y.b
8 );
ROWID A B
------------------ ---------- ----------
AAAGVNAB2AAAAQcAAA 1 1
AAAGVNAB2AAAAQcAAC 1 1
Let's change the sign from '<' to '>' now.
SQL> ed
Wrote file afiedt.buf
1 select x.rowid, x.a,x.b
2 from t1 x
3 where x.rowid>any (
4 select y.rowid
5 from t1 y
6 where x.a = y.a and
7 x.b = y.b
8* )
SQL> /
ROWID A B
------------------ ---------- ----------
AAAGVNAB2AAAAQcAAC 1 1
AAAGVNAB2AAAAQcAAE 1 1
Notice that Oracle left out the row with AAAGVNAB2AAAAQcAAE rowid first time and AAAGVNAB2AAAAQcAAA rowid second time.
Numbers to words!!!
A simple way of spelling out numbers to words. This could be handy for printing reciepts, cheques etc.
KDE>select to_char( to_date( abs(&num),'J'),'Jsp') "NUMBER TO TEXT" from dual
2 /
Enter value for num: 345
old 1: select to_char( to_date( abs(&num),'J'),'Jsp') "NUMBER TO TEXT" from dual
new 1: select to_char( to_date( abs(345),'J'),'Jsp') "NUMBER TO TEXT" from dual
NUMBER TO TEXT
------------------------
Three Hundred Forty-Five
Or You can use it on a column like
SELECT sal, TO_CHAR(TO_DATE(sal,'j'), 'jsp') FROM emp;
800.00 eight hundred
2,850.00 two thousand eight hundred fifty
2,450.00 two thousand four hundred fifty
5,000.00 five thousand
3,000.00 three thousand
2,975.00 two thousand nine hundred seventy-five
1,250.00 one thousand two hundred fifty
…
…
hope this helps…
Audit DDL Changes
Here is a way of keeping track of all the changes made to a schema by DDL statements.
DROP TRIGGER DDL_CHANGE_TRACKER
/
DROP TABLE dll_audit_log
/
CREATE TABLE ddl_change_tracker (
stamp DATE,
username VARCHAR2(30),
osuser VARCHAR2(30),
machine VARCHAR2(30),
terminal VARCHAR2(30),
operation VARCHAR2(30),
objtype VARCHAR2(30),
objname VARCHAR2(30))
/
CREATE OR REPLACE TRIGGER audit_ddl_changes
AFTER create OR drop OR alter
ON KIRTAN.SCHEMA
BEGIN
INSERT INTO DDL_CHANGE_TRACKER VALUES
(SYSDATE,
SYS_CONTEXT('USERENV', 'SESSION_USER'),
SYS_CONTEXT('USERENV', 'OS_USER'),
SYS_CONTEXT('USERENV', 'HOST'),
SYS_CONTEXT('USERENV', 'TERMINAL'),
ORA_SYSEVENT,
ORA_DICT_OBJ_TYPE,
ORA_DICT_OBJ_NAME
);
END;
/
show errors
CREATE TABLE my_test (a number)
/
DROP TABLE my_test
/
set pages 50000
SELECT * FROM DDL_CHANGE_TRACKER
/
Informatica PowerCenter Connect for Web Services
(You may find similar contents in Informatica Documentation. I have borrowed some description from the manuals to keep single version of truth alive.)
Overview:
Informatica PowerCenter Connect for Web Services works as a consumer of a Web Service. You can use PowerCenter Connect for Web Services to integrate with PowerCenter to read data from a web service source and write data to a web service target. You can also transform data during a session using PowerCenter Connect for Web Services.
For example, in a financial system where you are required to submit only ‘validated’ data to your financial application (eg. peoplesoft). Your financial application may provide you with a functionality that lets you validate incoming data. It may provide you with a web services oriented functionallity to do that. In such case, while processing incoming data for some other validation, you would use Web Services Consumer Transformation using PowerCenter Connect for Web Services to send and recieve messages from the provided web service.
A web service is a set of operations that you can ‘consume’ over a intranet or the internet. You can look at a web service as one or more functions that return results. When you access a web service, you request that the web service perform an operation and return data. A web service can contain many web service operations.
Web service operations contain input and output messages. These messages are XML-formatted messages. They specify how to structure a request for a web service. Web service access involves providers and consumers. A web service provider refers to the server that hosts the web service. A web service consumer refers to the client that requests a web service. PowerCenter Connect for Web Services accesses web services as a web service consumer.
The web service you access can be remote or local. Someone at another organization can create and publish the web service, or someone at your organization can create and publish it.
You only use PowerCenter Connect for Web Services as a web service consumer. To expose a PowerCenter workflow as a web service and make it available to others, you use PowerCenter Web Services Provider.
To read data from a web service, write data to a web service, or transform data using a web service, you must import a web service operation. You can import a web service operation from a Web Services Description Language (WSDL) file. WSDL files describe web services and web service operations. PowerCenter Connect for Web Services uses the information in the WSDL file to access a web service operation.
PowerCenter Connect for Web Services uses the Simple Object Access Protocol (SOAP). SOAP is a protocol for exchanging information between computers. It specifies how to encode XML data so that programs on different operating systems can pass information to each other.
Web services hosts contain WSDL files and web services.
Coming soon : WSDL Files
Informatica Performance Tuning
Informatica Performance Tuning
I have gathered this list over time while working at different client sites. It is not necessarily my list. But it definitely is a good list. If you don’t agree with anything below, don’t follow it. Anyways, here it goes….
INFORMATICA BASIC TUNING GUIDELINES
The following points are high-level issues on where to go to perform “tuning” in Informatica’s products. These are, in no way, permanent problem solvers, nor are they the end-all solution. Just some items (which if tuned first) might make a difference. The level of skill available for certain items will cause the results to vary.
To ‘test’ performance throughput it is generally recommended that the source set of data produce about 200,000 rows to process. Beyond this – the performance problems / issues may lie in the database – partitioning tables, dropping / re-creating indexes, striping raid arrays, etc… Without such a large set of results to deal with, your average timings will be skewed by other users on the database, processes on the server, or network traffic. This seems to be an ideal test size set for producing mostly accurate averages.
Try tuning your maps with these steps first. Then move to tuning the session, iterate this sequence until you are happy, or cannot achieve better performance by continued efforts. If the performance is still not acceptable, then the architecture must be tuned which can mean changes to what maps are created). In this case, you can contact us – we tune the architecture and the whole system from top to bottom.
KEEP THIS IN MIND: In order to achieve optimal performance, it’s always a good idea to strike a balance between the tools, the database, and the hardware resources. Allow each to do what they do best. Varying the architecture can make a huge difference in speed and optimization possibilities.
Utilize a database for significant data handling operations (such as sorts, groups, aggregates). In other words, staging tables can be a huge benefit to parallelism of operations. In parallel design – simply defined by mathematics, nearly always cuts your execution time.
Localize. Localize all target tables on to the SAME instance of Oracle (same SID), or same instance of Sybase. Try not to use Synonyms (remote database links) for anything (including: lookups, stored procedures, target tables, sources, functions, privileges, etc…). Utilizing remote links will most certainly slow things down. For Sybase users, remote mounting of databases can definitely be a hindrance to performance.
If you can – localize all target tables, stored procedures, functions, views, and sequences in the SOURCE database. Again, try not to connect across synonyms. Synonyms (or remote database tables) could potentially affect performance by as much as a factor of 3 times or more.
Remember that Informatica suggests that each session takes roughly 1 to 1 1/2 CPU’s. In keeping with this – Informatica play’s well with RDBMS engines on the same machine, but does NOT get along (performance wise) with ANY other engine (reporting engine, java engine, OLAP engine, java virtual machine, etc…)
Remove any database based sequence generators. This requires a wrapper function / stored procedure call. Utilizing these stored procedures can cause performance to drop by a factor of 3 times. This slowness is not easily debugged – it can only be spotted in the Write Throughput column. Copy the map, replace the stored proc call with an internal sequence generator for a test run – this is how fast you COULD run your map. If you must use a database generated sequence number, use them as part of the source qualifier query. If you’re dealing with GIG’s or Terabytes of information – this should save you lot’s of hours tuning.
TURN OFF VERBOSE LOGGING. The session log has a tremendous impact on the overall performance of the map. Force over-ride in the session, setting it to NORMAL logging mode. Unfortunately the logging mechanism is not “parallel” in the internal core, it is embedded directly in to the operations.
Turn off ‘collect performance statistics’. This also has an impact – although minimal at times – it writes a series of performance data to the performance log. Removing this operation reduces reliance on the flat file operations. However, it may be necessary to have this turned on DURING your tuning exercise. It can reveal a lot about the speed of the reader, and writer threads.
Try to eliminate the use of non-cached lookups. By issuing a non-cached lookup, your performance will be impacted significantly. Particularly if the lookup table is also a “growing” or “updated” target table – this generally means the indexes are changing during operation, and the optimizer looses track of the index statistics. Again – utilize staging tables if possible. In utilizing staging tables, views in the database can be built which join the data together; or Informatica’s joiner object can be used to join data together – either one will help dramatically increase speed. The thing with Informatica’s Joiner Transformation is that it needs caching of the data to scan through it to find matches. In such cases, make sure you tweak the cache sizes. Unless your source tables are from different RDBMS, let the RDBMS join them.
Separate complex maps – try to break the maps out in to logical threaded sections of processing. Re-arrange the architecture if necessary to allow for parallel processing. There may be more smaller components doing individual tasks, however the throughput will be proportionate to the degree of parallelism that is applied.
Balance between Informatica and the power of SQL and the database. Try to utilize the DBMS for what it was built for reading / writing / sorting / grouping / filtering data en-masse. Use Informatica for the more complex logic, outside joins, data integration, multiple source feeds, etc… The balancing act is difficult without DBA knowledge. In order to achieve a balance, you must be able to recognize what operations are best in the database, and which ones are best in Informatica. This does not degrade from the use of the ETL tool, rather it enhances it – it’s a MUST if you are performance tuning for high-volume throughput.
TUNE the DATABASE. Don’t be afraid to estimate: small, medium, large, and extra large source data set sizes (in terms of: numbers of rows, average number of bytes per row), expected throughput for each, turnaround time for load, is it a trickle feed? Give this information to your DBAs and ask them to tune the database for “worst case”. Help them assess which tables are expected to be high read/high write, which operations will sort (order by) etc. Moving disks, assigning the right table to the right disk space could make all the difference. Utilize a PERL script to generate “fake” data for small, medium, large, and extra large data sets. Run each of these through your mappings – in this manner, the DBA can watch or monitor throughput as a real load size occurs.
Be sure there is enough SWAP, and TEMP space on your PMSERVER machine. Not having enough disk space could potentially slow down your entire server during processing (in an exponential fashion). Sometimes this means watching the disk space while your session runs. Otherwise you may not get a good picture of the space available during operation. Particularly if your maps contain aggregates, or lookups that flow to disk Cache directory or if you have a JOINER object with heterogeneous sources.
Place some good server load monitoring tools on your PMServer in development – watch it closely to understand how the resources are being utilized, and where the hot spots are. Try to follow the recommendations – it may mean upgrading the hardware to achieve throughput.
TWEAK SESSION SETTINGS. In the session, there is only so much tuning you can do. Balancing the throughput is important – by turning on “Collect Performance Statistics” you can get a good feel for what needs to be set in the session – or what needs to be changed in the database. Read the performance section carefully in the Informatica manuals. Basically what you should try to achieve is: OPTIMAL READ, OPTIMIAL THROUGHPUT, and OPTIMAL WRITE. Over-tuning one of these three pieces can result in ultimately slowing down your session.
Try to merge expression objects, set your lookups to unconnected (for re-use if possible), check your Index and Data cache settings if you have aggregation, or lookups being performed. Etc… If you have a slow writer, change the map to a single target table at a time – see which target is causing the “slowness” and tune it. Make copies of the original map, and break down the copies. Once the “slower” of the N targets is discovered, talk to your DBA about partitioning the table, updating statistics, removing indexes during load, etc… There are many database things you can do here.
Remove all other “applications” on the PMServer. Except for the database / staging database or Data Warehouse itself. PMServer plays well with RDBMS (relational database management system) – but doesn’t play well with application servers, particularly JAVA Virtual Machines, Web Servers, Security Servers, application, and Report servers. All of these items should be broken out to other machines. This is critical to improving performance on the PMServer machine.
Hope this helps.
Kirtan Desai
Updates
Dan Linstedt from Myers-Holum Inc made a few comments a few days ago on this post. I have added his comments below.
- If you use a joiner, always use a sorted joiner – the cost of sorting data coming from the RDBMS staging area is frequently less than the cost of building the caches in place. Furthermore, if you are NOT on 64 bit Informatica, or you don’t have unlimited (seemingly unlimited) RAM, you have an upper limit to the caching mechanisms in all the cached objects – including Joiner.
- If the RDBMS is NOT tuned properly, putting more work into the RDBMS will actually slow things down.
- Too many Instances of an RDBMS on the SAME MACHINE will actually kill performance rather than help it, and it really is not necessary in order to handle fail-over as many set it up to do.
- Replacing a Lookup with a SORTED JOINER can improve performance dramatically.
- The manuals have the formula wrong for the Data / Index Cache settings, even though it’s counter-intuitive, you want 100% of the INDEX cached if you can get it, giving up Data Cache for disk. Why? Because if you can’t access RAM to check for the existence of data you are actually increasing I/O – when you increase I/O you slow performance dramatically. Set your Index caches to twice the size of your data caches to be safe.
There are many variables which can change the course of one’s actions. Feel free to ask questions on this blog.
About
This site is developed and maintained by me, Kirtan Desai. This is a personal blog.
These pages can be best viewed in 1024 x 768 resolution.
ora-01555
I have been dealing with ora-01555 all day. I guess I was one of the victims of misconception about the rollback segments. Here is a small description of what I understand of “01555” now.
There are three situations that can cause the ORA-01555 error:
1. An active database with an insufficient number of small-sized rollback segments
2. A rollback segment corruption that prevents a consistent read requested by the query
3. A fetch across commits while your cursor is open
Here are some simple rollback segment rules, as per Oracle’s documentation:
1. A transaction can only use one rollback segment to store all of its undo records.
2. Multiple transactions can write to the same extent.
3. The head of the rollback segment never moves into a rollback extent currently occupied by the tail.
4. Extents in the ring are never skipped over and used out of order as the head tries to advance.
5. If the head can’t use the next extent, it allocates another extent and inserts it into the ring.
Two issues that you need to consider when deciding whether your segment is large enough to fit the transaction:
1. Make sure that transactions will not cause the head to wrap around too fast and catch the tail.
2. If you have long-running queries that access frequently changing data, make sure that the rollback segment doesn’t wrap around and prevent the construction of a read-consistent view.
From a developer’s perspective, you can restructure your PL/SQL code to avoid fetching across commits that cause the ORA-01555 error.
these are general guidelines. one must read the oracle documentation to implement/follow these guidelinee.
