Re: 10g - Need advise on large lookup table and optimizing io
- From: DA Morgan <damorgan@xxxxxxxxx>
- Date: Sat, 07 Jan 2006 13:46:45 -0800
peter wrote:
Folks, I inherited an application that has 1 very large lookup table. This lookup table has a 90/10 read/write ratio. Most of the time it's read heavily, but during uploads records that don't currently exist in that table get inserted. This table looks like this.
USER_ID| EMAIL | CLIENT_ID | DUPLICATE_ID
------------------------------------------------------------------------ 1000000 'email1' 12312 NULL 1000001 'email2' 13434 1
Anyway this table has 400 million records an is not partitioned. He way 2 main types of queries against this table.
SELECT email from THISTABLE where DUPLICATE_ID IS NULL and user_id = ?
The other type of query executed is a SELECT where this table is simply joined on USER_PROF_ID to pull the email address.
SELECT A.EMAIL, B.COL1,B.COL2 FROM DEMOGRAPHICS B, THISTABLE A WHERE B.USER_PROF_ID = A.USER_PROF_ID
We now have indexes on the "THISTABLE" so that the actual data is never read. Only the indexes in place are used to satisfy all queries. The above SELECT query like many typical queries on our system pulls lots of data (500K records to 1 million records).
We have one specific index on the columns USER_ID|EMAIL|DUPLICATE_ID|CLIENT_ID that we use to satisfy all the large select queries...these queries do an INDEX RANGE SCAN on the index columns I just described ..and so never need to access the table rows.
Because the table/indexes have nearly 400 million records in them, and this index is read my all our "larger" select queries, this index is contented for and the IO waits are high.
I know that this data model is not great, but i inherited it and can't really change too mch now. But I'm wondering what I can do to speed up the lookups during the SELECT queries against that very large index ( Index now has 3 levels).
I've been doing some testing with a single table hash cluster, but I'm not sure if the benefits of a hash cluster outperform the benefits (in terms of IO) of looking up the rows in the large index and not having to read the table data.
If anyone out there has any input, I'd greatly appreciate it.
--peter
A couple of things I'd like to know about the tables ... how many
different user_ids and user_prof_ids in those 400M records. Is the
data skewed? And I'd be inclined to take a very harsh look at "DUPLICATE_ID IS NULL" and this is forcing a full table scan.
Starting with DUPLICATE_ID ... can you use a default value in place NULL? If not consider a function based index and query for the value created by the function used to build the index.
With respect to the USER_ID = ... my first instinct would be to convert the table to either a sorted hash cluster (10g) or Index-Organized Table.
With respect ot USER_PROF_ID ... I'd need to know more about the data.
All of the above is just pure instinct and I wouldn't do any of it without tracing and testing extensively.
HTH -- Daniel A. Morgan http://www.psoug.org damorgan@xxxxxxxxxxxxxxxx (replace x with u to respond) .
- Follow-Ups:
- References:
- Prev by Date: Re: Question about efficient ways to pass data from java to oracle
- Next by Date: Re: GUID Generation
- Previous by thread: 10g - Need advise on large lookup table and optimizing io
- Next by thread: Re: 10g - Need advise on large lookup table and optimizing io
- Index(es):
Relevant Pages
|