Audit tables in a database are useful for several purposes:
- Recording a history of changes of each row in a table
- Determining who modified which rows when and how
- Implementing change data capture, including deleted rows
- Saving your butt!
For data warehousing purposes, we are primarily interested in the change data capture aspect. Setup of a trigger based approach to CDC can be automated for any or all tables and avoids the need to change any existing tables or applications that depend on them. This approach also enables us to efficiently identify only the rows that have changed since our last ETL sync and it also captures deleted rows, something that can be tricky with timestamps in the original table.
cdc_audit is a little tool I wrote that presently contains two scripts:
- cdc_audit_gen_mysql.php : auto generates audit tables and triggers for all or selected tables in a DB.
- cdc_audit_sync_mysql.php : syncs diffs in audit table(s) to target .CSV file(s) that can be used in hadoop, map-r, etc.
A nice feature of cdc_audit is that it simplifies dealing with a pesky mysql trigger limitation. In mysql, only one trigger may be defined at a time for a given table + event, eg only one AFTER INSERT may be defined. So if we already have a trigger defined for a table that we want to audit, we have a problem. cdc_audit solves the problem by automatically incorporating the old trigger statement into the new trigger,so the existing trigger logic is preserved, but we also get the new logic to insert changed rows into the audit table.
Another feature is incremental audit table wipe. There are a few audit table patterns, and the pattern used by cdc_audit is performant and simple but can use a lot of disk space because each changed row is duplicated for every update to the source table, even if only a single column changed. For CDC purposes, our primary goal is to identify and sync the changed rows efficiently; we may not care much or at all about the audit history for any purposes beyond that. And for large tables, we do care about disk space. Incremental audit table wipe is an optional feature that can delete all but the last row in the audit table. Keeping the last row enables the sync process to continue working. Rows are deleted incrementally, 100 rows at a time, to avoid putting too much load on the database and preventing concurrent sessions from writing to the audit table.
- automates generation of audit tables, one audit table per source table
- automates generation of triggers to populate audit tables
- automates syncing of new rows in audit tables to .csv files.
- fast triggers. only one insert and 0 selects per trigger execution.
- Reads mysql information_schema to automatically determine tables and columns.
- Can generate tables + triggers for all database tables, or a specified list.
- Can sync audit tables for all database tables, or a specified list.
- Retains pre-existing trigger logic, if any, when generating AFTER triggers.
- sync script option to delete all but last audit row, to keep source DB small.