Detect Duplicates in a Table (Oracle)
This question is asked more times than one would imagine.
Question: How to detect duplicates in a table?
Answer:
There are multiple ways of doing things. Here are some of them.
Use of group by
select keys, count(*) from mytable
group by keys
having count(*) > 1
For Example:
SQL> select * from t1;
A B
---------- ----------
1 1
1 2
1 1
1 3
1 1
SQL> select a,b,count(*) from t1
2 group by a,b
3 having count(*)>1
4 /
A B COUNT(*)
---------- ---------- ----------
1 1 3
Use of ROWID
One way is to join the table to itself if it is a small table.
select * from t1 a, t1 b
where a.key = b.key and a.rowid != b.rowid
For Example:
SQL> create table t1 (a number, b number);
Table created.
SQL> insert into t1 values (1,1);
1 row created.
SQL> insert into t1 values (1,2);
1 row created.
SQL> insert into t1 values (1,1);
1 row created.
SQL> insert into t1 values (1,3);
1 row created.
SQL> commit;
Commit complete.
SQL> select * from t1;
A B
---------- ----------
1 1
1 2
1 1
1 3
SQL> select * from t1 a, t1 b
2 where a.a = b.a and a.b=b.b and a.rowid != b.rowid
3 /
A B A B
---------- ---------- ---------- ----------
1 1 1 1
1 1 1 1
This query however shows same record twice, i.e. one from “t1 a” and the other from “t1 b”.
The other way of doing it is by doing a sub-select on the same table. This query can be used in two ways. One way is to find all the rows that arrived into the table after the first one. The other way is to find all the rows that arrived into the table before the last one. The key is to switch between < and > signs.
select x.rowid, x.
from t1 x
where x.rowid > any (
select y.rowid
from t1 y
where x. = y.
)
For Example:
SQL> ed
Wrote file afiedt.buf
1 select x.rowid, x.a,x.b
2 from t1 x
3 where x.rowid>any (select y.rowid
5 from t1 y
6 where x.a = y.a and
7 x.b = y.b
8* )
SQL> /
ROWID A B
------------------ ---------- ----------
AAAGVNAB2AAAAQcAAA 1 1
SQL> insert into t1 values (1,1);
1 row created.
SQL> select x.rowid, x.a,x.b
2 from t1 x
3 where x.rowid>any (select y.rowid
5 from t1 y
6 where x.a = y.a and
7 x.b = y.b
8 );
ROWID A B
------------------ ---------- ----------
AAAGVNAB2AAAAQcAAA 1 1
AAAGVNAB2AAAAQcAAC 1 1
Let's change the sign from '<' to '>' now.
SQL> ed
Wrote file afiedt.buf
1 select x.rowid, x.a,x.b
2 from t1 x
3 where x.rowid>any (
4 select y.rowid
5 from t1 y
6 where x.a = y.a and
7 x.b = y.b
8* )
SQL> /
ROWID A B
------------------ ---------- ----------
AAAGVNAB2AAAAQcAAC 1 1
AAAGVNAB2AAAAQcAAE 1 1
Notice that Oracle left out the row with AAAGVNAB2AAAAQcAAE rowid first time and AAAGVNAB2AAAAQcAAA rowid second time.

Hi,
i wanna delete duplicate records of a table.
this needs to be done in db2.
i tried a sample query like select rowid from tablename
but got the error
db2 => SELECT ROWID FROM TABLE_A
SQL0206N “ROWID ” is not valid in the context where it is used.
SQLSTATE=42703
db2 =>
can u help me out on this.
even i have tried
Shan,
I apologize for a delayed reply. I am not so familiar with DB2 but here is what I found online. This would be a good learning opportunity for me too.
Here is what I have found so far.
**** Found on internet.****
As of DB2 V7, there is no equivalent to an Oracle ROWID. DB2 supports a data type called ROWID, but it is not the same as Oracle’s ROWID. You might want to use the DB2 IDENTITY property to simulate a row number. Of course, it will not be the same as with Oracle.
To use IDENTITY you would need to create the table with an integer column that is assigned the IDENTITY property. You can specify it such that DB2 always automatically generates a sequentially incremented number and assigns that to the column when data is inserted to the table. Then you would select that column in place of the Oracle ROWID. But beware, because the rows will not necessarily be stored contiguously on disk according to the identity column (unless you also specify that column in a clustering index). Also, the data is actually stored in the rows of the DB2 table.
**** ****
I tried similar code on my machine (now I have a DB2 instance running on a virtual machine.)
“Create the IDENTITY column”
ALTER table1 ADD ROWID INTEGER GENERATED ALWAYS AS IDENTITY
If you already have data than update ROWID column on all the rows with a sequence generated id.
If you don’t have data in your tables than use a sequence generated ID to populate the ROWID column.
“Remove the duplicates”
DELETE FROM table1
WHERE ROWID NOT IN
(SELECT Min(ROWID)
FROM table1
GROUP BY Col1, Col2, Col3)
“Avoid any more duplicates”
CREATE UNIQUE INDEX table1_UK ON table1(ROWID)
Hope this help
Or even better. Migrate to Oracle 10g. You will be happy.