Post by Douglas GregorPerformance-wise, how should referential integrity be enforced when
dealing with a modest data warehouse? If millions of records are
inserted into the database, daily, triggers don't seem to be the
answer. Are constraints (i.e. foreign key constraints) the answer?
Yes, definitely. Triggers are inappropriate for many reasons,
definitely inappropriate for a DW.
Declarative Referential Integrity is essential. Both for REFERENCES
and CHECK CONSTRAINTS. Sybase does some clever things under the
covers, that enhances performance of inserts/updates, at both the
child and the parent level.
It is far more preferable to prevent corrupt data from getting into
the db, than to look for it and clean it up after it gets there.
Sybase is set up for that. Eg. You can stream data into a table
using "fast" bcp, which means dropping the indices and disabling the
triggers on the table. bcp honours DEFAULTS but unfortunately not RI
CONSTRAINTS. So the trick that most of us use is:
1 create a staging_db with truncate_log_on_chkpt, select_into/
bulk_copy/Parallel_sort db_ptions set
2 that allows you to *not* set those options on the main target_db,
which remains fully recoverable
3 create a copy of the table in the staging_db, with no indices and
triggers
4 fast bcp the data into the staging_db..table
5 execute DELETE WHERE to remove bad data (eg. that does not satisfy
FK constraints)
7 insert target_db..table/select from source_db..table
If you want even more speed, and you are willing to backup the
target_db *after* loading, rather than at some arbitrary time:
6 set the select_into/bulk_copy/Parallel_sort db_option on target_db
7 select into target_db..table from staging_db..table
8 optional: drop/create clustered index to de-fragment the table and
maintain speed
9 clear the select_into/bulk_copy/Parallel_sort db_option on target_db
10 dump target_db
Post by Douglas GregorI feel foreign key constraints should not break the boundaries of a
database meaning tables in DB1 should not have any constraints on
tables in DB2. Crossing DB boundaries doesn't make sense.
Well, it depends on how you have set them up, and what you are using
them for.
If you take the concept that "a db is a recoverable unit" and
completely independent, strictly, then yes, you are correct. And it
keeps life nice and simple.
But most of us have mature systems and databases, and we can control
the order of db recovery. Having one database for trading, market
info, and data feeds in/out of the sever, then having one database
each for front office and back office, which have FK constraints
REFERENCING the trading database, is normal. We really do not want to
duplicate the trading database; and front and back office have to be
clinically separated; but they both source data from trading. Sybase
supports that beautifully. On recovery, ensure you load trading_db
first; then front_db or back_db.
Regards
Derek