query plan: Optimized using the forced options (internally generated Abstract Plan)

Discussion:

(too old to reply)

rebelde

2011-10-14 12:04:12 UTC

Hello,

We moved recently a lot of our installations at customer side from ASE 12.5.4 to ASE 15.0.3 ESD#4
(on Solaris SPARC or SuSE SLES11). All customers complain about some kind of performance degree,
especially while unloading large amount of rows with some ESQL/C application to file system.

At the beginning of this migration phase early this year, this was even more dramatically,
until we learned that for our ESQL/C application we must use some kind of Optimizer fallback to 12.5 as

sp_configure "optimization goal", 0, allrows_oltp
sp_configure 'enable compatibility mode', 1

This made production again possible, but the complaint about for example "unloading takes
twice as much as before in 12.5" remained; we now digged deeper into this and it turned
out that the rumor is true; we installed both (12.5 and 15.0.3) on the same idle Linux server
and with the same databases and can measure that

1. unloading 1000 rows does take twice as the time before, even if all rows are in cache
(because we repeat the unload)

2. the ESQL/C client is waiting in poll(2) for the ASE response; here is was happens
on syscall level for unloading one row (by a SELECT based on a Index), note the 1 ms sleep
in poll marked with ^^^^^^^^^):

0.000204 send(5, "\17\1\0\317\0\0\0\0b\302\0\0\0\1\0\rsid_koe_daten\256\0\0\0create proc sid_koe_daten as SELECT SYB_IDENTITY_C
OL,*,datalength(nettodaten) FROM koe_daten (i"..., 207, 0) = 207
0.001053 read(5, "\4\1\1\213\0\0\0\0", 8) = 8
0.000048 read(5, "b\20\0\0\0 \0\rsid_koe_daten \r\0\0\0\1\0\0\0\0\0\0\0\0\0\0008\0aN\1\0\0\10\0\0\5amper\5sisis\tkoe_daten\20SY
B_IDENTITY_COL@\0\0\0\n\0\0\0l\6\n\0\0\0\5amper\5sisis\tkoe_daten\6katke"..., 387) = 387
0.000165 send(5, "\17\1\0000\0\0\0\0##\0\0\0\rkoe_daten_seq\10\0\r\0\0\0sid_koe_daten\0\0", 48, 0) = 48
0.000126 read(5, "\4\1\0\33\0\0\0\0", 8) = 8
0.000052 read(5, "\203\7\0\1\0\20\0\3\25\0\375\0\0\2\0\0\0\0\0", 19) = 19
0.000102 send(5, "\17\1\0)\0\0\0\0\204\5\0\1\0\20\0\1 \16\0\0\0\1\0\0\0\0\0\0\0\0\0\0&\4\0\327\4\1\0\0\0", 41, 0) = 41
0.000910 read(5, 0x806753c, 8) = -1 EAGAIN (Resource temporarily unavailable)
0.000062 poll([{fd=5, events=POLLIN, revents=POLLIN}], 1, 1000) = 1
^^^^^^^^^^^^^
0.001109 read(5, "\4\1\1y\0\0\0\0", 8) = 8
^^^^^^^^^^^^^
0.000065 read(5, "aN\1\0\0\10\0\0\5amper\5sisis\tkoe_daten\20SYB_IDENTITY_COL@\0\0\0\n\0\0\0l\6\n\0\0\0\5amper\5sisis\tkoe_date
n\6katkey \0\0\0\7\0\0\0&\4\0\0\5amper\5sisis\tkoe_daten\6akt"..., 369) = 369
0.000281 send(5, "\17\1\0\20\0\0\0\0\202\5\0\1\0\20\0\1", 16, 0) = 16
0.000088 read(5, 0x806753c, 8) = -1 EAGAIN (Resource temporarily unavailable)
0.000053 poll([{fd=5, events=POLLIN, revents=POLLIN}], 1, 1000) = 1
0.000091 read(5, "\4\1\0\312\0\0\0\0", 8) = 8
0.000059 read(5, "\321\6\0\0\0\0\r1\4\1\0\0\0\2\0\0\1 \nX820086590,ABSATZWIRTSCHAFTLICHE GESELLSCHAFT NUERNBERG\20\262\7\0\0\0\
0\0\0\0\0\0\0v\251!\3\0\0\0\0\3\331\\\245M\0\0\0\0\1\0- Absatzwirtschaftlich"..., 194) = 194

using the 12.5 ASE this poll(2) is not there and it continues just with read(2)
reading the ASE response from socket;

3. while simulating a single SELECT with isql it turned out that there is an important
difference between the showplan of 12.5 and 15.0.3; this is for 15.0.3 here:

1> begin transaction
1> set showplan on

1> DECLARE koe_daten_seq CURSOR FOR SELECT SYB_IDENTITY_COL,* FROM koe_daten (index koe_katkey) WHERE katkey=1

QUERY PLAN FOR STATEMENT 1 (at line 1).

STEP 1
The type of query is DECLARE CURSOR.

1> open koe_daten_seq

QUERY PLAN FOR STATEMENT 1 (at line 1).

STEP 1
The type of query is OPEN CURSOR koe_daten_seq.

QUERY PLAN FOR STATEMENT 1 (at line 1).
Optimized using the forced options (internally generated Abstract Plan).
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

STEP 1
The type of query is DECLARE CURSOR.
^^^^^^^^^^^^^^^^^^^^^^^^

the 12.5 does not contain the marked lines, but just

...
1> open koe_daten_seq

QUERY PLAN FOR STATEMENT 1 (at line 1).

STEP 1
The type of query is OPEN CURSOR koe_daten_seq.

QUERY PLAN FOR STATEMENT 1 (at line 1).

STEP 1
The type of query is SELECT.

FROM TABLE
koe_daten
Nested iteration.
Index : koe_katkey

The "Statement-level output" chapture of ASE explains the message of
"Optimized using the forced options (internally generated Abstract Plan)"
only for 'alter table' or 'reorg commands' (page 44).

What does this mean? Thanks in advance

matthias

Derek Asirvadem

2011-10-14 14:04:33 UTC

Permalink

Post by rebelde
What does this mean? Thanks in advance

a. The AQP is generated internally, to override the normal QP,
because you have specified:
___(INDEX koe_katkey)
normally called "force index" or "index hints", and the way that is
expedited, is to generate an AQP. Nothing surprising about that. the
optimiser was rewritten in 15.0, so there are many things in it, and
in its showplans, that will not be found in 12.5. The underlined
lines are to be expected, that is not what is keeping this slow.

b. I do not understand why you expect 12.5 and 15.0 binaries to match,
syscall for syscall. Forget about that. Then you need to consider
that ASE was very nicely bound to Sun/SPARC, you cannot expect the
same from little old Suse, all the drivers are different.

c. That (@3) is not a "single SELECT", it is a cursor, that fetches
one row. A cursor is a particularly slow method of interacting with
ASE. It is a set-oriented engine, not a row-processor. Even if you
tweaked the configuration, it would not help an app that is sitting
there, calling for one row at a time, processing it, calling for the
next row.

d. Although there have been many enhancements recently, to move large
amounts of rows via client-side apps (eg. bulk-copy; arrays), (i) I
doubt that ESQL/C was enhanced and (ii) you would have to change a
little code to take advantage of such. So right now, you are
executing row-processing commands, on the slowest CT-Lib library,
without any code changes to take advantage of the new facilities. On
a new smaller box. Using massively changed ASE.

e. Why, exactly, are you using ESQL/C (or any client-side program for
that matter), to unload large volumes of data, to a file system
file ? Do you know, there are server-side utilities that perform that
task 1,000s of times faster, and do not use the network ? If you must
transfer data across the network, there are client-side utilities that
perform that task 100s of times faster.

rebelde

2011-10-14 14:31:38 UTC

Permalink

Firts of all, thanks for your fast and detailed answer.

Post by Derek Asirvadem

Post by rebelde
What does this mean? Thanks in advance

a. The AQP is generated internally, to override the normal QP,
___(INDEX koe_katkey)
normally called "force index" or "index hints", and the way that is
expedited, is to generate an AQP. Nothing surprising about that. the
optimiser was rewritten in 15.0, so there are many things in it, and
in its showplans, that will not be found in 12.5. The underlined
lines are to be expected, that is not what is keeping this slow.

I understand, and deleting '(INDEX koe_katkey)' let go away the line
about 'Optimized AP' from the showplan output; but it continues to say:

1> open koe_daten_seq

QUERY PLAN FOR STATEMENT 1 (at line 1).

STEP 1
The type of query is OPEN CURSOR koe_daten_seq.

QUERY PLAN FOR STATEMENT 1 (at line 1).

STEP 1
The type of query is DECLARE CURSOR.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

whiloe in 12.5.4 it says:

1> open koe_daten_seq

QUERY PLAN FOR STATEMENT 1 (at line 1).

STEP 1
The type of query is OPEN CURSOR koe_daten_seq.

QUERY PLAN FOR STATEMENT 1 (at line 1).

STEP 1
The type of query is SELECT.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

i.e. in 12.5.3 it says 'query is SELECT' and in 15.0.3 it declares a new
CURSOR: 'query is DECLARE CURSOR'; why?

Post by Derek Asirvadem
b. I do not understand why you expect 12.5 and 15.0 binaries to match,
syscall for syscall. Forget about that. Then you need to consider
that ASE was very nicely bound to Sun/SPARC, you cannot expect the
same from little old Suse, all the drivers are different.

I do not expect that syscalls match, but I see the 0.0011 sec delay (spent
in poll(2)) which underpins that 15.x ASE needs longer at this point;

Post by Derek Asirvadem
one row. A cursor is a particularly slow method of interacting with
ASE. It is a set-oriented engine, not a row-processor. Even if you
tweaked the configuration, it would not help an app that is sitting
there, calling for one row at a time, processing it, calling for the
next row.

Normally the hit list of the SELECT is not only one row, but thousands; and
it must be a CURSOR in the app to move back and forth (think in paging back
and forth in some kind of sorted dictionary);

Post by Derek Asirvadem
d. Although there have been many enhancements recently, to move large
amounts of rows via client-side apps (eg. bulk-copy; arrays), (i) I
doubt that ESQL/C was enhanced and (ii) you would have to change a
little code to take advantage of such. So right now, you are
executing row-processing commands, on the slowest CT-Lib library,
without any code changes to take advantage of the new facilities. On
a new smaller box. Using massively changed ASE.

We are talking about a high complex Library Management System with some 500
tables, written in C and ESQL/C and we can't so easy adopt on new
Optimizer. At least we expect to get the old performance with the new ASE.
Is this to much expectation? :-)

Post by Derek Asirvadem
e. Why, exactly, are you using ESQL/C (or any client-side program for
that matter), to unload large volumes of data, to a file system
file ?

Because the netto data is stored in BLOBS which must be read and interpreted
by the application; and: this (unloading) is only an example where the
performance degree is easy to proof. In the dialog oriented part of the
apps using the same interface it is only to 'feel' but not so easy to
proof; that's why I picked this up;

Post by Derek Asirvadem
Do you know, there are server-side utilities that perform that
task 1,000s of times faster, and do not use the network ? If you must
transfer data across the network, there are client-side utilities that
perform that task 100s of times faster.

I know, but this does not help here.

The main question is: why is the same ESQL/C app with 15.0.3 twice as time
slower reading rows as 12.5.4?

Thanks

matthias

Derek Asirvadem

2011-10-15 04:32:36 UTC

Permalink

Post by rebelde
Firts of all, thanks for your fast and detailed answer.

My pleasure.

Post by rebelde
The type of query is DECLARE CURSOR.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The type of query is OPEN CURSOR koe_daten_seq.
The type of query is SELECT.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
i.e. in 12.5.3 it says 'query is SELECT' and in 15.0.3 it declares a new
CURSOR: 'query is DECLARE CURSOR'; why?

I think I mentioned, there is no possibility of matching either the
QP, or the ShowPlan of the QP, line by line. All this Showplan shows,
is that 15.x produces two separate notices in the ShowPlan, where
12.5.3 produces one notice. There is no "new" or second cursor, it is
notifying the OPEN as well as the DECLARE.

Every command submitted to ASE is a "query". Every new command starts
a STEP 1.

There are many diffs between the Showplans. Some are definitely worth
investigation, but you must use a context, a view from 1,000 metres,
and examine the Showplan for the actual SELECT, etc, which is the
operative part of the Showplan. What you are doing here, is examining
the hair of the beast with a magnifying glass, and worrying about the
observation that the hair is a different colour. That observation,
while correct, is irrelevant. We want the flesh of the deer, not the
hair.

Now if you find a difference (not in no of logged entries, but a
material difference) in the two ShowPlans re the two SELECTS within
the cursor, that is worth discussing.

Post by rebelde
I do not expect that syscalls match, but I see the 0.0011 sec delay (spent
in poll(2)) which underpins that 15.x ASE needs longer at this point;

Two answers.

1. So ?

An XYZ binary written in 1994 (plus 15 years of patches) goes through
one code path on a Sun/SPARC, and a PQR binary written in 2007 goes
through another code path on an Intel. So ? Both will achieve the
same functional result. There is no basis to compare them.

2. If you forget about comparing the two, and just look at the problem
you have on the new box, there may be some work to do. The WAIT, is
because every time the app calls for one row, and then ASE furnishes
it to the app, ASE correctly identifies that the batch is complete,
takes the process off the RUN queue, and places it in the SLEEP
queue. Then it waits around for the app to to wake up, process the
single row, send ASE another batch (of SELECT one row). During that
time (which is very long when cursors are used), if ASE is idle, it
will itself go to SLEEP, and need to be woken up. You may well have
determined that ASE takes 0.0011 sec delay to wake up and process the
new batch.

2.1 Excellent. Go and do something about it. Change Linux to treat
ASE like a server and give it priority, allocated memory, never to
swap out, etc.

2.2 But do you want to change the config of ASE ? On the basis of one
stupid cursor ? Performance tuning has to take consideration of the
whole load; all the spids; and divide resources accordingly. If you
change config based on one or the other single event, the whole is
likely to suffer bad performance. Therefore if you want to look at
this, you need to look at the overall performance and the overall
config of ASE, not the performance of one spid or where it is getting
stuck.

The old transport is the ICE from Ulm to Frankfurt in 2 hours 40 mins,
it is designed to move 1,000 people. You want one person to fly.
Sure, that can be done, but you need to make arrangements at
Weisbaden, the usual arrangements at the Hauptbahnhof will not do.

2.3 Also note that you are not looking at the action of the spid and
ASE from 1,000 metres or 10,000 metres, you are using a magnifying
glass. That will result in you concentrating on specifics, while
missing the whole context.

2.4 15.x has 250 config parms, they are inter-related. Additionally
almost every system resource can be configured. You should not expect
to tweak one or the other config parm without affecting the related
parms, the related parms should be changed in concert.

2.5 I am assuming you have read the Sybase literature on Migration
from 12.x to 15.x, and you have set the config accordingly. Also that
it is configured to take full advantage of the resources available on
the SuSe box, not configured for the old Sun.

2.6 Have you read that you should set:

____EXEC sp_configure "runnable process search count", 50 -- not 5,
not 2000

This will affect all spids, not just the cursor spid. Test the hell
out of it, and adjust accordingly.

2.7 To close this 0.0011 sec delay item, sure, chase it down. But do
not be surprised if it does not help the cursor spid that much.

Post by rebelde
Normally the hit list of the SELECT is not only one row, but thousands; and
it must be a CURSOR in the app to move back and forth (think in paging back
and forth in some kind of sorted dictionary);

Yes, but it is "paging back and forth" one row at a time. You are
processing thousands of calls for a single row each time; set-
processing means one single call which processes thousands of rows.

Send me (via email, do not post here) the SELECTs (both the outer
SELECT for the CURSOR, and the inner SELECT for the paging-back-and-
forth, plus the DDL for each table referenced, and I will return a set-
processing single SELECT.

Post by rebelde
We are talking about a high complex Library Management System with some 500
tables, written in C and ESQL/C and we can't so easy adopt on new
Optimizer. At least we expect to get the old performance with the new ASE.
Is this to much expectation? :-)

Absolutely not. I have performed quite a few benchmarks, and have
quite "complex" databases. My clients have some very badly designed
databases. ASE 15.x is much, much faster than 12.5. But, guess what,
it is no faster on a badly designed db, or on poor SQL code. The more
normalised your db, the cleaner your code, the faster ASE 15.x will
be. I have never seen it slower than 12.5, but then, I have never
written a cursor in my life (I have examined and replaced hundreds).

New Optimiser. This is where the problem is. Again, clean code and a
decent db result in no problems whatsoever. Either poor code or
monster of a db, and the new Optimiser will make incorrect decisions,
or take mins evaluating thousands of possible plans. Both of which
can be reined in with "optimization timeout limit", etc.

But since you are running with "enable compatibility mode", you are
not getting most of the 15.5 features, you are operating as close to
12.5 operation that the 15.x binary can give you. You have taken the
flight option, but you insist on having full dinner service before
landing, so live with the fact that you will be circling for half an
hour before landing.

Complex dbs are no problem. Sybase excels with highly complex dbs.
Complex code is no problem. Bad code, poorly designed code, overuse
of temp tables, they are the problems, and they will be exposed during
any migration to any major release. Cursor code is limited to single
row processing, so it doesn't matter how fast or faster the engine is,
you are operating at the speed of moving X thousand rows across the I/
O subsystem and across the network, one row/packet at a time.

Having an app, with SQL code, such that it is glued at the hip to a
particular optimiser, is a problem. That your company invested so
much money in an app, which is now tied to a platform that is no
longer being maintained, and cannot move to the new (faster, better,
more features) maintained binary, *is* a problem.

Post by rebelde

e. Why, exactly, are you using ESQL/C ... cursor ...

e.1 I am not at all convinced that you can't extract all the affected
rows (including the blobs), and process them once, in a stream; that
you have to go through the entire process one row at a time.

e.2 You may find the Full Text Search Option of benefit to you, it is
perfect for a library system, and for searching the content of Text
columns.

Post by rebelde

... server-side utilities ... client-side utilities ... 100s of times faster.

I know, but this does not help here.

Of course. But for the future, for the next version of the app (there
will be one, it isn't going to sit at 15-year-old code, unchanged,
no ?), you might want to take note of that, and write functions that
stream the data, 1,000 rows, once, rather than execute 1,000 loops/
processes on the data, one row at a time. Good design is about using
the Right Tool for the Job, and using appropriate tools in combination
to get the desired result. It is not about using SQL only, via a C
program only; that is Maslow's Hammer.

Post by rebelde
The main question is: why is the same ESQL/C app with 15.0.3 twice as time
slower reading rows as 12.5.4?

I do not understand why/how you suggest you have "proved" anything of
the sort. You are comparing apples and oranges, actually stale, hard
apples from Spain with stale, soft oranges from Brazil.

If you want to run a reasonable test, to prove anything, you need to:
- install ASE 12.5.4 on the SuSe box
- configure it properly (55 parms + resources) for the SuSe box
- dump/load the db from the SPARC to the Suse; or bcp-out/in and
rebuild all DataStructures
- verify that timings of the app are as expected/better/worse; resolve
those issues
- *now* you have a benchmark against which you can test or prove
something related to the conclusion you suggest

Then:
- install ASE 15.0.3 on the SuSe box
- ensure you are using the same edition: either Enterprise or Small
Business
- configure it properly (250 parms + resources) for the SuSe box
- ensure that the 15.0.3 config is intentionally the same as the
12.5.4 config, that there is no bias XOR that you are taking advantage
of new features, and make a declaration re the specifics of that
- ensure you have the same named caches, and cache settings,
parallelism, etc
- ensure you have followed the Migration Guidelines reasonably (not
perfectly, but do not miss the important steps)
- ensure you are using memory and the I/O subsystem exactly the same
way (/ufs, raw partitions, SAN LVs, etc)
- personally, I do not care if you use the "build new server" or the
"upgrade server in place" method; I care that you have done it
correctly and *verified it is correct*
- personally, I do not care if you use the "dump/load database" or
the "upgrade database in place" method; I care that you have done it
correctly and *verified it is correct*
- ensure you have rebuilt all 12.0 Index Structures to 15.x Index
Structures (you may even have the TextChains sitting at 11.9
DataStructures)
- verified all that
- run the app code up and test the hell out of it, as per documented
requirement; take note of the completely re-written Optimiser;
identify all issues; resolve them
- now run the same benchmark

- if one of your issues is that you can't use the 15.0 Optimiser, that
you must use "enable compatibility mode", then you will never be able
to compare 12.5.4 apples with 15.0.3 apples; you are comparing 12.5.4
apples with 15.0.3 tinned apples. The syrup is sweet but the apples
are devoid of substance. You can draw some conclusions, but only on
that declared basis (12.5.4 vs 15.0.3 Compatibility Mode). I can tell
you from experience of scores of Migrations, before you spend any time
on a proper test:
____ (a) bad code, if it runs unchanged, will run *slightly* faster in
15.0.x native mode
____ (b) may need changes, to run at all
____ (c) no idea of compatibility mode, because most people (at least
my customers) are ready to change code in order to obtain the 15.0.x
speed, and
____ (d) good code will run unchanged, completely free of new
Optimiser "issues", at least 10, maybe 20 times faster.
I have not changed one single line in our 700 table, 1.2TB database,
Migration was effortless. We changed a little bit of code *after*
migration to take advantage of new features. It is a different story
with customers, whose code is of varying quality.

- one day, when you overcome the requirement to run in compatibility
mode, then you can compare 12.5.4 apples with 15.0.3 apples, and then
you can draw some conclusions, and make some claims, such as you have
made. The fact that you require compatibility mode, places your code
(most of it ?) squarely in the poor code category.

3. Do me a personal favour and execute this code in your 15.0.3
subject database, then post the results.

SELECT OBJECT_NAME(id),
________name,
________"Suspect DataStructure"
____FROM sysindexes
____WHERE status & 32768 = 32768
UNION
SELECT OBJECT_NAME(id),
________"",
________"11.9 Table"
____FROM sysobjects
____WHERE type = "U"
____AND sysstat2 & 65536 = 0
ORDER BY 1, 2

4. Other.
ASE 15.0.3 has a totally re-written Optimiser; it took a while to bed
it down. The 12.5.x End of Life Notice coincided with the release of
15.5. I never certified *any* 15.0.x release, they were all
horrible. ASE 15.5 is good. Do not waste time going from 12.5.4 to
15.0.3 to 15.5; go directly to 15.5.

5. Attitude.
You appear to be very app-centric, not at all friendly with ASE,
expecting new binaries to service your app exactly the same as your 15-
year-old out-of-date end-of-lifed binary. That may not be the best
way to serve your employer. They made an investment in a Sybase
Licence, that progresses with the times, not in a binary that will not
change. The 12.5.4 King is Dead, Long Live the 15.0 King ! Release
your happy past with the old king, get friendly with the new majesty,
learn about it. Refuse to do that, and you will never get over the
grief. I can assure you the new one will serve you at least as well
as the old one, and definitely better if you get friendly with it.

Further, you will get far more performance out of your app, if you get
to know ASE (old or new), and design the app for ASE, rather than in
isolation from it. I can guarantee at least 100 times better
performance for each cursor that you replace with set-processing code,
but that will only be attractive to you if you release your app-
centrism, and wish to take advantage of the librarys Sybase Licence
investment.

Regards
Derek

rebelde

2011-10-18 08:18:51 UTC

Permalink

Post by Derek Asirvadem
4. Other.
ASE 15.0.3 has a totally re-written Optimiser; it took a while to bed
it down. The 12.5.x End of Life Notice coincided with the release of
15.5. I never certified any 15.0.x release, they were all
horrible. ASE 15.5 is good. Do not waste time going from 12.5.4 to
15.0.3 to 15.5; go directly to 15.5.
5. Attitude.
You appear to be very app-centric, not at all friendly with ASE,
expecting new binaries to service your app exactly the same as your 15-
year-old out-of-date end-of-lifed binary. That may not be the best
way to serve your employer. They made an investment in a Sybase
Licence, that progresses with the times, not in a binary that will not
change. The 12.5.4 King is Dead, Long Live the 15.0 King ! Release
your happy past with the old king, get friendly with the new majesty,
learn about it. Refuse to do that, and you will never get over the
grief. I can assure you the new one will serve you at least as well
as the old one, and definitely better if you get friendly with it.
Further, you will get far more performance out of your app, if you get
to know ASE (old or new), and design the app for ASE, rather than in
isolation from it. I can guarantee at least 100 times better
performance for each cursor that you replace with set-processing code,
but that will only be attractive to you if you release your app-
centrism, and wish to take advantage of the librarys Sybase Licence
investment.

Hi Derek,

I did what you suggest and stopped thinking about my apps, but watching what ASE 15.x
is doing after receiving the few SQL statements of

begin transaction
go
DECLARE koe_daten_seq CURSOR FOR SELECT SYB_IDENTITY_COL,* FROM koe_daten WHERE katkey=1
go
open koe_daten_seq
go
fetch koe_daten_seq
go
rollback
go

the ASE is idle, that's why it is easy to watch with strace(1), here is what the King is doing:

0.000083 recv(25, "\17\1\0 \0\0\0\0!\23\0\0\0\0begin transaction\n", 2048, 0) = 32
0.000136 send(25, "\4\1\0\21\0\0\0\0\375\4\0\1\0\0\0\0\0", 17, 0) = 17
0.000081 select(26, [24 25], NULL, NULL, {0, 0}) = 0 (Timeout)
0.000063 select(26, [24 25], NULL, NULL, {0, 0}) = 1 (in [25], left {0, 0})
0.000081 recv(25, "\17\1\0h\0\0\0\0![\0\0\0\0DECLARE koe_daten_"..., 2048, 0) = 104
0.000470 send(25, "\4\1\0\21\0\0\0\0\375\4\0\1\0\0\0\0\0", 17, 0) = 17
0.000081 select(26, [24 25], NULL, NULL, {0, 0}) = 0 (Timeout)
0.000068 select(26, [24 25], NULL, NULL, {0, 0}) = 1 (in [25], left {0, 0})
0.000085 recv(25, "\17\1\0!\0\0\0\0!\24\0\0\0\0open koe_daten_seq"..., 2048, 0) = 33
0.000202 gettimeofday({1318924357, 741825}, NULL) = 0
0.001216 gettimeofday({1318924357, 743042}, NULL) = 0
0.000392 send(25, "\4\1\0\21\0\0\0\0\375\4\0\1\0\0\0\0\0", 17, 0) = 17
0.000097 select(26, [24 25], NULL, NULL, {0, 0}) = 0 (Timeout)
0.000065 select(26, [24 25], NULL, NULL, {0, 0}) = 0 (Timeout)
0.000070 select(26, [24 25], NULL, NULL, {0, 0}) = 1 (in [25], left {0, 0})
0.000078 recv(25, "\17\1\0\"\0\0\0\0!\25\0\0\0\0fetch koe_daten_se"..., 2048, 0) = 34
0.000338 send(25, "\4\1\2\22\0\0\0\0a>\1\0\0\7\0\0\5amper\5sisis\tkoe"..., 530, 0) = 530
0.000084 select(26, [24 25], NULL, NULL, {0, 0}) = 0 (Timeout)
0.000062 select(26, [24 25], NULL, NULL, {0, 0}) = 0 (Timeout)
0.000068 select(26, [24 25], NULL, NULL, {0, 0}) = 0 (Timeout)
0.000055 select(26, [24 25], NULL, NULL, {0, 16666}) = 1 (in [25], left {0, 17000})
0.000354 recv(25, "\17\1\0\27\0\0\0\0!\n\0\0\0\0rollback\n", 2048, 0) = 23

Do you see the problem? I do :-)

between receiving 'open koe_daten_seq' and sending the response to ISQL there is:

-- NO disk i/o
-- NO further syscall than gettimeofday() and the second one with a delay of 1.216ms

that means that the time is spent in CPU cycles within ASE! The King is thinking.

Thanks

matthias

rebelde

2011-10-18 10:16:08 UTC

Permalink

0.000085 recv(25, "\17\1\0!\0\0\0\0!\24\0\0\0\0open koe_daten_seq"...,2048, 0) = 33
0.000202 gettimeofday({1318924357, 741825}, NULL) = 0
0.001216 gettimeofday({1318924357, 743042}, NULL) = 0
0.000392 send(25, "\4\1\0\21\0\0\0\0\375\4\0\1\0\0\0\0\0", 17, 0) = 17

the same situation in a server ASE 12.5.x is:

0.000079 recv(48, "\17\1\0!\0\0\0\0!\24\0\0\0\0open koe_daten_seq"..., 512, 0) = 33
0.000542 gettimeofday({1318930882, 566}, NULL) = 0
0.000084 send(48, "\4\1\0\21\0\0\0\0\375\4\0\1\0\0\0\0\0", 17, 0) = 17

so, we now know where the time in 15.x is spent: in CPU cycles, twice as much as
in 12.x in this case of OPEN CURSOR

matthias

Derek Asirvadem

2011-10-18 15:31:50 UTC

Permalink

On Oct 18, 9:16 pm, rebelde <***@unixarea.de> wrote:

Matthias

That is excellent work.

Post by rebelde
so, we now know where the time in 15.x is
spent: in CPU cycles, twice as much as
in 12.x in this case of OPEN CURSOR

You have one bad experience with ASE 15, one cursor, one conclusion.
I have about good expereince with about 35 ASE 15s (with the details
noted above), quite a different conclusion. ASE 15 does have specific
enhancements for cursors. Please read, you can restore 12.5 operation
for cursor sensitivity (I am not recommending that, only that you
should be aware of it). There is also a CR for a known bug but I am
pretty sure it is fixed in ESD#4. I will check tomorrow.

But the diagnosis is not complete yet. We have to figure out (a) why
the King is not being woken up when the I/O has completed and (b) what
the hell He is doing so much thinking for. I have never had to go
down to the level of strace to figure out an ASE performance problem,
so I am relying on you to interpret the report correctly.

Have you noted [2.6] above, what is your RPSC ?

3.1 How many engines ?

3.2 How many CPU/cores/threads on the box ?

3.3 What is the CPU/I.O/Idle from sp_symon (of at least one hour,
including the cursor operating somewhere in this duration) ?

3.4 Please confirm, you are running nothing but ASE on this box..

Regards
Derek

rebelde

2011-10-19 07:57:26 UTC

Permalink

Post by Derek Asirvadem
Have you noted [2.6] above, what is your RPSC ?

Yes. We stumbled about this parameter already years ago (search in c.d.s.
for my name :-) and got to know that 2000 is a bad value; even 50 is bad
because it let cycle ASE dataservers to much in poll(2) syscalls until
giving up the CPU; normaly we set it to 5 or even below to 2; we run around
100 12.5.x installations at customer sides, i.e. we are not newbies :-)
in the ASE 15.x in our case RPSC is set to 5:

$ fgrep runnab /opt/syb154/ASE-15_0/oclcSYB.cfg
runnable process search count = 5

Post by Derek Asirvadem
3.1 How many engines ?

the ASE 12.5 and 15.x have each only one dataserver:

$ ps ax | fgrep dataserver
4569 ? Ss
388:50 /opt/syb154/ASE-15_0/bin/dataserver -d/opt/syb154/data/master.dat -e/var/spool/sybase/oclcSYB.log -c/opt/syb154/ASE-15_0/oclcSYB.cfg
-M/opt/syb154/ASE-15_0 -soclcSYB
6709 ? Ss
55:57 /opt/sybase/ASE-12_5/bin/dataserver -d/database/sybase/1252/master.dat -e/var/spool/sybase/sisisSYB.log -c/opt/sybase/ASE-12_5/sisisS
YB.cfg -M/opt/sybase/ASE-12_5 -ssisisSYB

Post by Derek Asirvadem
3.2 How many CPU/cores/threads on the box ?

SLES sees 4 CPU's:

$ dmesg | egrep -i 'CPU[0-9]'
CPU0: Intel(R) Xeon(TM) CPU 3.20GHz stepping 03
CPU1: Intel(R) Xeon(TM) CPU 3.20GHz stepping 03
CPU2: Intel(R) Xeon(TM) CPU 3.20GHz stepping 03
CPU3: Intel(R) Xeon(TM) CPU 3.20GHz stepping 03

Post by Derek Asirvadem
3.3 What is the CPU/I.O/Idle from sp_symon (of at least one hour,
including the cursor operating somewhere in this duration) ?

the two ASE on this box (a test server) are idle, i.e. no other connections
to the port 4000 and 5000:

$ netstat -an | egrep '4000|5000'
tcp 0 0 10.49.92.18:4000 0.0.0.0:* LISTEN
tcp 0 0 10.49.92.18:5000 0.0.0.0:* LISTEN

Post by Derek Asirvadem
3.4 Please confirm, you are running nothing but ASE on this box..

ACK; only the two dataserver, ssh login and the ISQL tests with this 10
lines of SQL.

matthias

Derek Asirvadem

2011-10-19 18:29:45 UTC

Permalink

On Oct 19, 6:57 pm, rebelde <***@unixarea.de> wrote:

Thanks for the response.

----

Post by rebelde
we are not newbies :-)

Sure, but apparently you are not friendly with ASE either. The lip
service about the King, in the context of a 10-line SQL test that is
held up as if it proves something; the steadfast commitment to the
microscope ... those are not the actions of someone committed to
resolution, they are the actions of someone holding on to righteous
proof, justified enmity. The German people I have come to love and
cherish are not like that, but then I have only had 23 visits over the
last 26 years, I am still finding out.

I trust you appreciate, the parm had to be checked, I am just doing my
job, no suggestion that you are this or that.

----

Post by rebelde

Post by Derek Asirvadem
3.1 How many engines ?

What does
$ isql -U -P -SoclcSYB ... EXEC sp_configure "max online engines"
and
$ ditto -SsisisSYB
return ?
I am not concerned about the .cfg on the disk.

----

Post by rebelde

Post by Derek Asirvadem
3.2 How many CPU/cores/threads on the box ?

Please advise, what is the purpose of running one engine on a 4xCPU
box, or is this just a test box that happens to be 4xCPU ? What is
the normal production config: one ASE with ? on a 4xCPU box ?

----

Post by rebelde

Post by Derek Asirvadem
3.3 What is the CPU/I.O/Idle from sp_symon (of at least one hour,
including the cursor operating somewhere in this duration) ?

the two ASE on this box (a test server) are idle, i.e. no other connections
$ netstat -an | egrep '4000|5000'
tcp 0 0 10.49.92.18:4000 0.0.0.0:* LISTEN
tcp 0 0 10.49.92.18:5000 0.0.0.0:* LISTEN

Pardon me, but that is not sp_sysmon.

Try:
- isql -SsisisSYB ... SHUTDOWN

- isql -SoclcSYB -oASE_15_0_3_4_config_.txt
- EXEC sp_configure
- EXEC sp_cacheconfig
- EXEC sp_helpcache
- EXEC sp_helpdevice

- run the app cursor, keep it running for over 60 mins

- isql -SoclcSYB -oASE_15_0_3_4_sysmon_yymmdd_hh.txt
- EXEC sp_sysmon "00:60:00"
- terminate the cursor app after the isql session returns
- email the two txt files to me

If confidentiality is an issue, please be advised that we all our
custs are banks and universities, I will send you the normal contract
& non-disclosure for a $0 fixed price project. We were Sybase
Partners for 14 years, and gave that up last year. Visit www.softwaregems.com.au.

I have to permit you to use your microscope, I trust you will permit
me to use my helicopter, together we will resolve this. Here is a
sample of photographs of the scenery during the flight in a
Messerschitt Bf 109, over Schwäbische Alb (yours will be one hour, any
vertical bar, exploded):
http://www.softwaregems.com.au/Documents/Reference/Reference%20Y%20Stats.pdf

Quite coincidentally, something that may interest you, we determined
the peak of ASE Busy Percent at 14:05, when the system was idle, was a
Sybase bug. It took some time to fix. We may get lucky with your
stats, but that it not what I am expecting. (To close that
possibility, I strongly recommend 15.5, this problem may disappear. I
did not certify any 15.0.x release.)

If you can provide vmstat & iostat for the same 60 minute, that will
be a substantial help. In which case, the ASE+vmstat+iostat metrics
can appear on the same graph.

----

Post by rebelde

Post by Derek Asirvadem
3.4 Please confirm, you are running nothing but ASE on this box.

ACK; only the two dataserver, ssh login and the ISQL tests

Apologies. I should have said "Please confirm, you are running nothing
but *one* ASE on this box". I now know the answer is two ASEs. For
this test, and for all future tests, while we are inspecting ASE 15,
we need ASE 12 shut down (not idle).

ASE (11 or 12 or 15) is designed and architected to be run as the
*only* server on the box. Two ASEs, or one ASE plus other services,
is guaranteed to run into problems, of various kinds. I suggest you
read up on the subject. For now, we will test only one ASE 15, as a
single server. After that you are free to do whatever you like: if
there is a significant differences when you run two ASEs, we can
handle that as a separate question.

----

Post by rebelde
with this 10 lines of SQL

I appreciate that that test is very meaningful to you. Please be
advised, that to me, (a) it is not a valid test and (b) it proves
nothing that is relevant to diagnosing your posted problem (it may
well prove certain unrelated facts to you).

I am expecting to debug your posted problem by other means. Please
note, the test I need you to run for the 60 mins, is the test at the
top of the thread, which continuously FETCHES rows+text via the cursor
via the ESQL/C app; it is *not* the 10-line ISQL test.

----

Post by rebelde
1. unloading 1000 rows does take twice as the time before, even if all rows are in cache

That statement is incorrect. The rows may be in cache, but the text
chain is not, and the app is processing the text chain in the FETCH.

Regards
Derek

rebelde

2011-10-31 13:06:32 UTC

Permalink

Post by Derek Asirvadem

Derek Asirvadem

2011-11-03 21:17:03 UTC

Permalink

Matthias

Post by rebelde
Any comments on this?

Sure.

1. Ok, so you have not taken up my offer of a flight.

2. The question is completely different. As stated previously, if
you post the SHOWPLAN, we need the full DDL for the tables concerned.

3. What results (at all levels of inspection) do you get for:

DECLARE koe_daten_seq CURSOR FOR
SELECT SYB_IDENTITY_COL,
<column_list>
FROM koe_daten
WHERE katkey=1

And exclude all blobs from <column_list>.

As I am sure you are aware, you can't load * into a @variable; in the
normal circumstance, the cursor would load each column into a separate
variable; * cannot be used; it is only useful for testing purposes.

4. Can you please run this code on the 15.0.3 database that koe_daten
resides, and post the results.

SELECT [Pre-11.9 Structure] = name
FROM sysobjects
WHERE type = "U"
AND sysstat2 & 65536 = 0
ORDER BY 1

5. If you have not done so, execute:

DBCC REBUILD_TEXT( table_name)

On all tables that have blobs.

You can fix up a little bit here and a little bit there, but if you
have not unloaded/reloaded (via bcp out/in) your large tables,
especially those with blobs, in a long time, then it is much better to
do that. ASE 11.9 introduced far more efficient DataStructures for
NCIs and blobs. Take advantage of everything you have learned since
the database was first deployed, use Segments, distribute the tables,
distribute the DataStructures-of-tables, separate NCIs from the CI/
Heaps, etc. You can eliminate Level I Fragmentation while you are at
it (unloading/reloading tables in place will remove Level II but not
Level I).

Regards
Derek

Derek Asirvadem

2011-11-07 03:45:25 UTC

Permalink

Post by Derek Asirvadem
You can eliminate Level I Fragmentation while you are at
it (unloading/reloading tables in place will remove Level II but not
Level I).

Here's a doc that explains what I mean.

http://www.softwaregems.com.au/Documents/Sybase%20GEM%20Documents/Sybase%20Fragmentation.pdf

Regards
Derek