Discussion:
Statistics and Reorg
(too old to reply)
1dg618
2011-11-10 23:12:01 UTC
Permalink
If I drop an index on a table and create a new index do I have to
update the statistics or "reorg" the table? My assumption is that I do?
Derek Asirvadem
2011-11-11 12:35:21 UTC
Permalink
Post by 1dg618
If I drop an index on a table and create a new index do I have to
update the statistics or "reorg" the table? My assumption is that I do?
The three functions are independent but related.

-----
REORG
-----
APL is truly clustered, and self-trimmed, REORG does not apply; it
only applies to DPL/DRL. And then, it fixes Level III Fragmentation
(which APL does not suffer).

- Drop/Create APL CI
Complete re-write of the data in the table, along with the CI B-Tree.
Corrects Level II Fragmentation

- Drop/Create DPL/DRL CI
As detailed in a previous response to you, there is no such thing as a
"clustered" index for DPL/DRL, there is a Placement Index, which
identifies the *initial* placement of rows. However in order to
address it, one must address the "clustered" index. So Drop/Create
DPL/DRL "CI" actually re-writes all the rows in the separate Heap data
structure. This correct Levels II & III Fragmentation. Therefore
Reorg is not required.

As INSERT/UPDATE/DELETE activity causes Level II & III Fragmentation,
one has the choice to Drop/Create "CI" xor REORG. It has several
flavours, the most strenuous being REBUILD, which basically rewrites
the entire Heap. REORG has the "feature" of being able to limit the
scope of the REORG to single partitions, etc, and can be run for short
durations, but it interferes with the dump tran cycle, and therefore
has to be run in the maintenance window. It depends on many things,
such as how well you have configured the server, but most people find
Drop/Create "CI" is faster than REORG REBUILD, and does a better job
of de-fragmentation.

- Drop/Create APL/DPL/DRL NCI
REORG is completely independent (handles the Heap, not the indices).

------------
UPDATE STATS
------------

This is completely independent of REORG.

Regardless of your request, Stats are created only for populated
tables.

The command has changed and improved over time, and there are several
flavours. The older flavour updated stats, ad created a histogram,
only for the leading column in the column list you give it. CREATE
INDEX automatically executes this UPDATE STATS. Therefore, if your
Key is single column, there is nothing more to do.

UPDATE STATS is pretty much demanded weekly. Most of us UPDATE STATS
on one seventh of the tables that change, every night, rolling through
the week.

DROP INDEX does not delete stats (DROP TABLE does). So if you DROP/
CREATE INDEX for a multi-column key, there is no need to UPDATE
STATS. The stats and Histogram for the leading column will be re-
written.

Therefore, because the stats are current, we never have to UPDATE
STATS after a DROP/CREATE INDEX or REORG.
1dg618
2011-11-11 20:33:43 UTC
Permalink
I have an index on a table. I run some intense queries against the
table. The times are the following: 5 minutes, 4 minutes, and 5
minutes. I drop the index. I re-create the index switching two columns
in the index. I run the same intense queries against the table. The
times are the following: 6 minutes, 7 minutes, and 7 minutes. I
figured the time could be explained by the statistics? I must have
missed something in a few of my previous posts. I thought the order of
the columns in the index would not matter; the optimizer would figure
it out?
Derek Asirvadem
2011-11-11 22:44:07 UTC
Permalink
Post by 1dg618
I must have
missed something in a few of my previous posts. I thought the order of
the columns in the index would not matter; the optimizer would figure
it out?
I specifically stated that the order of columns in an index *does*
matter. If the index you dropped/created was the clustered index,
then it matters even more. The difference could mean loss/gain of
Range Queries, resulting in much more than 20 to 50%.

I do not know exactly what you think the Optimiser can, or should,
figure out here. I have been careful to identify specifically what
the optimiser can do, under specific circumstances, indices, etc.

If you have an index (col_a, col_b), and those columns are not in any
other index, then:
- a query on col_a, or col_a and col_b, will be very fast
- for a query on col_b alone:
--- if there are good stats and a histogram on col_b, the Optimiser
will make good decisions, but there is no directly useful index; it
will be various levels of slow
--- if not, it will be the bottom end of slow

(Sybase is very, very fast overall, and you can configure all
resources that deal with I/O and memory. Depending on how well/badly
you have done that, the difference between a very fast query and a
table scan, may well be 5 mins vs 7 mins. Eg the 5 mins may be due to
poor config [if I configured your server, I may get that down to 1
min); or the 7 mins could be blazing fast. Yet another reason why
elapsed times are not enough to derive anything reasonable from; you
need to compare SHOWPLANS and STATS IO.)

If you drop that index, and create an index (col_b, col_a), then:
- a query on col_a, or col_a and col_b, will be very fast
- a query on col_b alone will be very fast

Regards
Derek

Loading...