Referential Integrity

Discussion:

(too old to reply)

Douglas Gregor

2011-10-16 14:30:44 UTC

Performance-wise, how should referential integrity be enforced when
dealing with a modest data warehouse? If millions of records are
inserted into the database, daily, triggers don't seem to be the
answer. Are constraints (i.e. foreign key constraints) the answer? I
feel foreign key constraints should not break the boundaries of a
database meaning tables in DB1 should not have any constraints on
tables in DB2. Crossing DB boundaries doesn't make sense.

Rob V

2011-10-16 15:44:25 UTC

Permalink

Post by Douglas Gregor
Performance-wise, how should referential integrity be enforced when
dealing with a modest data warehouse? If millions of records are
inserted into the database, daily, triggers don't seem to be the
answer. Are constraints (i.e. foreign key constraints) the answer? I
feel foreign key constraints should not break the boundaries of a
database meaning tables in DB1 should not have any constraints on
tables in DB2. Crossing DB boundaries doesn't make sense.

I would say that having FK constraints cross-DB boundary *can* make
sense. It just depends on how your system has been designed.
Apart from that, I think RI constraints would be a good way to go;
trigegrs are probably not for this type of use case. You can also avoid
usign constraints or triggers if you know that in most cases the RI will
be fine, and just run a query after loading the data to verify that the
RI is OK (and then delete/fix the rows that do not match).
--
HTH,

Rob V.
-----------------------------------------------------------------
Rob Verschoor

Certified Sybase Professional DBA for ASE 15.0/12.5/12.0/11.5/11.0
and Replication Server 15.0.1/12.5 // TeamSybase

Author of Sybase books (order online at www.sypron.nl/shop):
"Tips, Tricks& Recipes for Sybase ASE" (ASE 15 edition)
"The Complete Sybase ASE Quick Reference Guide"
"The Complete Sybase Replication Server Quick Reference Guide"

***@NO.SPAM.sypron.nl | www.sypron.nl | Twitter: @rob_verschoor
Sypron B.V., The Netherlands | Chamber of Commerce 27138666
-----------------------------------------------------------------

1dg618

2011-10-16 16:24:38 UTC

Permalink

On Oct 16, 10:44 am, Rob V

Post by Rob V

I would say that having FK constraints cross-DB boundary *can* make
sense. It just depends on how your system has been designed.
Apart from that, I think RI constraints would be a good way to go;
trigegrs are probably not for this type of use case. You can also avoid
usign constraints or triggers if you know that in most cases the RI will
be fine, and just run a query after loading the data to verify that the
RI is OK (and then delete/fix the rows that do not match).
--
HTH,
Rob V.
-----------------------------------------------------------------
Rob Verschoor
Certified Sybase Professional DBA for ASE 15.0/12.5/12.0/11.5/11.0
and Replication Server 15.0.1/12.5 // TeamSybase
"Tips, Tricks& Recipes for Sybase ASE" (ASE 15 edition)
"The Complete Sybase ASE Quick Reference Guide"
"The Complete Sybase Replication Server Quick Reference Guide"
Sypron B.V., The Netherlands | Chamber of Commerce 27138666
-----------------------------------------------------------------

If a table in another database has a constraint on a table in another
database, the data couldn't be deleted until the data is deleted in
the other database; therefore, one has to find all the tentacle. Clean
up could be a pain.

Derek Asirvadem

2011-10-17 00:35:10 UTC

Permalink

Post by 1dg618
If a table in another database has a constraint on a table in another
database, the data couldn't be deleted until the data is deleted in
the other database; therefore, one has to find all the tentacle. Clean
up could be a pain.

I think we are talking about the relevance of FKs across dbs here, not
cascading deletes, which is the other way around. Generally DWs are
inserted to, not deleted from, and in the context here, it is the
child, not the parent.

But if you have the need for cascading deletes, that is no problem.
Just write a simple sproc, in the parent_db to:
- delete grandchild_table;
- then delete child_table;
- then delete parent_table.
A sproc can reference any db on the server (it can address tables in
dbs in another server, but let's not get distracted).

Derek Asirvadem

2011-10-17 00:20:22 UTC

Permalink

Yes, definitely. Triggers are inappropriate for many reasons,
definitely inappropriate for a DW.

Declarative Referential Integrity is essential. Both for REFERENCES
and CHECK CONSTRAINTS. Sybase does some clever things under the
covers, that enhances performance of inserts/updates, at both the
child and the parent level.

It is far more preferable to prevent corrupt data from getting into
the db, than to look for it and clean it up after it gets there.
Sybase is set up for that. Eg. You can stream data into a table
using "fast" bcp, which means dropping the indices and disabling the
triggers on the table. bcp honours DEFAULTS but unfortunately not RI
CONSTRAINTS. So the trick that most of us use is:
1 create a staging_db with truncate_log_on_chkpt, select_into/
bulk_copy/Parallel_sort db_ptions set
2 that allows you to *not* set those options on the main target_db,
which remains fully recoverable
3 create a copy of the table in the staging_db, with no indices and
triggers
4 fast bcp the data into the staging_db..table
5 execute DELETE WHERE to remove bad data (eg. that does not satisfy
FK constraints)
7 insert target_db..table/select from source_db..table

If you want even more speed, and you are willing to backup the
target_db *after* loading, rather than at some arbitrary time:
6 set the select_into/bulk_copy/Parallel_sort db_option on target_db
7 select into target_db..table from staging_db..table
8 optional: drop/create clustered index to de-fragment the table and
maintain speed
9 clear the select_into/bulk_copy/Parallel_sort db_option on target_db
10 dump target_db

Post by Douglas Gregor
I feel foreign key constraints should not break the boundaries of a
database meaning tables in DB1 should not have any constraints on
tables in DB2. Crossing DB boundaries doesn't make sense.

Well, it depends on how you have set them up, and what you are using
them for.

If you take the concept that "a db is a recoverable unit" and
completely independent, strictly, then yes, you are correct. And it
keeps life nice and simple.

But most of us have mature systems and databases, and we can control
the order of db recovery. Having one database for trading, market
info, and data feeds in/out of the sever, then having one database
each for front office and back office, which have FK constraints
REFERENCING the trading database, is normal. We really do not want to
duplicate the trading database; and front and back office have to be
clinically separated; but they both source data from trading. Sybase
supports that beautifully. On recovery, ensure you load trading_db
first; then front_db or back_db.

Regards
Derek

1dg618

2011-10-17 13:18:52 UTC

Permalink

Yes, definitely. Triggers are inappropriate for many reasons,
definitely inappropriate for a DW.
Declarative Referential Integrity is essential. Both for REFERENCES
and CHECK CONSTRAINTS. Sybase does some clever things under the
covers, that enhances performance of inserts/updates, at both the
child and the parent level.
It is far more preferable to prevent corrupt data from getting into
the db, than to look for it and clean it up after it gets there.
Sybase is set up for that. Eg. You can stream data into a table
using "fast" bcp, which means dropping the indices and disabling the
triggers on the table. bcp honours DEFAULTS but unfortunately not RI
1 create a staging_db with truncate_log_on_chkpt, select_into/
bulk_copy/Parallel_sort db_ptions set
2 that allows you to *not* set those options on the main target_db,
which remains fully recoverable
3 create a copy of the table in the staging_db, with no indices and
triggers
4 fast bcp the data into the staging_db..table
5 execute DELETE WHERE to remove bad data (eg. that does not satisfy
FK constraints)
7 insert target_db..table/select from source_db..table
If you want even more speed, and you are willing to backup the
6 set the select_into/bulk_copy/Parallel_sort db_option on target_db
7 select into target_db..table from staging_db..table
8 optional: drop/create clustered index to de-fragment the table and
maintain speed
9 clear the select_into/bulk_copy/Parallel_sort db_option on target_db
10 dump target_db

Well, it depends on how you have set them up, and what you are using
them for.
If you take the concept that "a db is a recoverable unit" and
completely independent, strictly, then yes, you are correct. And it
keeps life nice and simple.
But most of us have mature systems and databases, and we can control
the order of db recovery. Having one database for trading, market
info, and data feeds in/out of the sever, then having one database
each for front office and back office, which have FK constraints
REFERENCING the trading database, is normal. We really do not want to
duplicate the trading database; and front and back office have to be
clinically separated; but they both source data from trading. Sybase
supports that beautifully. On recovery, ensure you load trading_db
first; then front_db or back_db.
Regards
Derek

Thank you for your reply filled with awesome information.

1dg618

2011-10-17 13:24:28 UTC

Permalink

Yes, definitely. Triggers are inappropriate for many reasons,
definitely inappropriate for a DW.

But REFERENCES and CHECK CONSTRAINTS are not wise if they span
databases?

Is there anyway to speed up the building of or dropping of an index on
a very large table?

Declarative Referential Integrity is essential. Both for REFERENCES
and CHECK CONSTRAINTS. Sybase does some clever things under the
covers, that enhances performance of inserts/updates, at both the
child and the parent level.
It is far more preferable to prevent corrupt data from getting into
the db, than to look for it and clean it up after it gets there.
Sybase is set up for that. Eg. You can stream data into a table
using "fast" bcp, which means dropping the indices and disabling the
triggers on the table. bcp honours DEFAULTS but unfortunately not RI
1 create a staging_db with truncate_log_on_chkpt, select_into/
bulk_copy/Parallel_sort db_ptions set
2 that allows you to *not* set those options on the main target_db,
which remains fully recoverable
3 create a copy of the table in the staging_db, with no indices and
triggers
4 fast bcp the data into the staging_db..table
5 execute DELETE WHERE to remove bad data (eg. that does not satisfy
FK constraints)
7 insert target_db..table/select from source_db..table
If you want even more speed, and you are willing to backup the
6 set the select_into/bulk_copy/Parallel_sort db_option on target_db
7 select into target_db..table from staging_db..table
8 optional: drop/create clustered index to de-fragment the table and
maintain speed
9 clear the select_into/bulk_copy/Parallel_sort db_option on target_db
10 dump target_db

Well, it depends on how you have set them up, and what you are using
them for.
If you take the concept that "a db is a recoverable unit" and
completely independent, strictly, then yes, you are correct. And it
keeps life nice and simple.
But most of us have mature systems and databases, and we can control
the order of db recovery. Having one database for trading, market
info, and data feeds in/out of the sever, then having one database
each for front office and back office, which have FK constraints
REFERENCING the trading database, is normal. We really do not want to
duplicate the trading database; and front and back office have to be
clinically separated; but they both source data from trading. Sybase
supports that beautifully. On recovery, ensure you load trading_db
first; then front_db or back_db.
Regards
Derek

1dg618

2011-10-17 15:58:49 UTC

Permalink

Post by 1dg618

Yes, definitely. Triggers are inappropriate for many reasons,
definitely inappropriate for a DW.

But REFERENCES and CHECK CONSTRAINTS are not wise if they span
databases?
Is there anyway to speed up the building of or dropping of an index on
a very large table?

Well, it depends on how you have set them up, and what you are using
them for.
If you take the concept that "a db is a recoverable unit" and
completely independent, strictly, then yes, you are correct. And it
keeps life nice and simple.
But most of us have mature systems and databases, and we can control
the order of db recovery. Having one database for trading, market
info, and data feeds in/out of the sever, then having one database
each for front office and back office, which have FK constraints
REFERENCING the trading database, is normal. We really do not want to
duplicate the trading database; and front and back office have to be
clinically separated; but they both source data from trading. Sybase
supports that beautifully. On recovery, ensure you load trading_db
first; then front_db or back_db.
Regards
Derek- Hide quoted text -

- Show quoted text -

If I have the following columns"

updated_by DEFAULT SUSER_NAME()
updated_date DEFAULT GETDATE()
created_by DEFAULT SUSER_NAME()
created_date DEFAULT GETDATE()

When the records are inserted the default will populate the columns.

Then I can have an update trigger to populate the updated_by and
updated_date. Is this a good idea?

Will the defualt on the insert trigger the update trigger?

Derek Asirvadem

2011-10-18 15:08:53 UTC

Permalink

Post by 1dg618
Thank you for your reply filled with awesome information.

My pleasure.

Post by 1dg618

Yes, definitely. Triggers are inappropriate for many reasons,
definitely inappropriate for a DW.

But REFERENCES and CHECK CONSTRAINTS are not wise if they span
databases?

I have detailed the answer to that above: some times they are wise,
other times unwise. Is there anything specific that you find
confusing ?

Post by 1dg618

Post by 1dg618
Is there anyway to speed up the building of or dropping of an index on
a very large table?

(it is best that you open a new thread for each separate question.)

Drop. It is very fast and cannot be mode faster.

Create Index. Yes. Make sure you do the following (in descending
order of returns):
- use the WITH SORTED_DATA option
- have enough sort buffers
- have enough proc cache
- that the index will fit into the data cache (default or named)

Composite keys are no problem, but of course it helps if the columns
in the index are (a) narrow and (b) fixed.

Post by 1dg618
If I have the following columns"
updated_by DEFAULT SUSER_NAME()
updated_date DEFAULT GETDATE()
created_by DEFAULT SUSER_NAME()
created_date DEFAULT GETDATE()
When the records are inserted the default will populate the columns.

If they are not supplied in the INSERT

Post by 1dg618
Then I can have an update trigger to populate the updated_by and
updated_date. Is this a good idea?

Yes, it is the common method.

An even better method. Those of us who do not use triggers, ensure
that (a) the udpate is transactional (b) set updated_by and
updated_date in the transaction.

Post by 1dg618
Will the defualt on the insert trigger the update trigger?

No. The insert trigger (if there is one) is executed on insert.
Separately, the update trigger (if there is one) is executed on update
of any column.
The default does not come into play.

Regards
Derek

1dg618

2011-10-18 23:55:07 UTC

Permalink

Post by Derek Asirvadem

Post by 1dg618
Thank you for your reply filled with awesome information.

My pleasure.

Post by 1dg618

Yes, definitely. Triggers are inappropriate for many reasons,
definitely inappropriate for a DW.

But REFERENCES and CHECK CONSTRAINTS are not wise if they span
databases?

I have detailed the answer to that above: some times they are wise,
other times unwise. Is there anything specific that you find
confusing ?

Post by 1dg618

Post by 1dg618
Is there anyway to speed up the building of or dropping of an index on
a very large table?

(it is best that you open a new thread for each separate question.)
Drop. It is very fast and cannot be mode faster.
Create Index. Yes. Make sure you do the following (in descending
- use the WITH SORTED_DATA option
- have enough sort buffers
- have enough proc cache
- that the index will fit into the data cache (default or named)
Composite keys are no problem, but of course it helps if the columns
in the index are (a) narrow and (b) fixed.

If they are not supplied in the INSERT

Post by 1dg618
Then I can have an update trigger to populate the updated_by and
updated_date. Is this a good idea?

Yes, it is the common method.
An even better method. Those of us who do not use triggers, ensure
that (a) the udpate is transactional (b) set updated_by and
updated_date in the transaction.

Post by 1dg618
Will the defualt on the insert trigger the update trigger?

Thank you. In the future, I'll create a new thread for each question.