Continuous Database Integration (CDBI) is the process of rebuilding your database and test data any time a change is applied to a project's version control repository.
Do you ever feel like your source code and database are operating in different "galaxies" throughout the development lifecycle on projects? As a developer, you may wait several days for a change to the database. You may even be restricted from making minor test data changes, or are afraid to make data changes for fear of ruining the one shared database for fellow developers. Situations like these are not unusual, and effectively utilizing CDBI can help alleviate some of these challenges and many others as well.
Revisiting the theme of the book, database integration is one of the parts of the Integrate button (see Figure 5-1), because it is based on the principle that database code (DDL, DML, configuration files, etc.) is, in essence, no different from the rest of the source code in a system. In fact, the artifacts related to database integration:
- Should reside in a version control system
- Can be tested for rigor and inspected for policy compliance
- And can be generated using your build scripts
Therefore, the building of the database can be incorporated into a CI system and can enjoy the same benefits as the rest of the project source code. What's more, changes to database source code can trigger an integration build just as other source code changes do.
FIGURE 5-1 Database integration in the Integrate button
____________________________________________________________________________
Not All Data Sources Are Alike
Some projects, or portions of projects, don't use a database
exactly the way that we define it in this chapter. However, most
projects need to persist data, be it in a flat file, an XML file, a
binary file, or an RDBMS. Regardless of your chosen persistent
store, the principles of CDBI apply.
____________________________________________________________________________
As a first step in describing how to automate database integration
with CI, we start by describing how to incorporate database integration
into a build process. The scripts used to build, configure, and populate
a database need to be shared with the rest of the project team, so we
discuss which database files are committed to a version control repository.
Automating a database integration build process solves only part
of the problem, so we go one step further by rebuilding the database
and data at every software change—making the verification process
continuous. If a team is adopting CDBI for the first time, most people
on a project will probably need to modify their development practices,
so we finish the chapter looking at effective CDBI practices.
____________________________________________________________________________
Refactoring Databases
The topics covered in this chapter could even be the subject of a
separate book. Other materials already make the case for treating
your database as just another type of source code that is
managed through the version control repository. This chapter
gives you the essentials to automate and run database integration
processes continuously.
____________________________________________________________________________
Automate Database Integration
On many projects, a database administrator (DBA) can often feel like a
short-order cook. DBAs typically have analytical skills that took many
years to cultivate, but they often spend most of their time performing
low-level command tasks. What's more, this job role can also be
stressful, because the DBA often becomes a development bottleneck as
the team members wait for the DBA to apply one small change to the
database after another. Here's a familiar scenario.
Nona (Developer): Hi Julie, will you set up a development database
for me on the shared development machine?
Julie (DBA): I am in the middle of something. I should be able to set it
up later this afternoon. Would you like the data from last week or an
export of today's data?
Nona: Today's data.
Julie: Okay, I can have that for you by tomorrow morning.
10 minutes later…
Scott (Technical Lead): I am unable to perform testing on the test
server because there are no assigned records for the Reviewer role.
Julie: Oh, let me create some test records that are assigned this role. I
think Nona may have used up all of those records.
Scott: Thanks. While you're at it, would you remove the Y/N constraint
on the APPROVED columns on the PERSON table? We'd like
to use different flags on this column.
It's more of the same on a typical day for the DBA. Not only is this
a poor use of the DBA's talents, it causes a significant bottleneck, especially
in the continuous approach promoted by CI. If you asked any
DBA what they'd rather do on a day-to-day basis, they would probably
tell you that they'd rather spend time on data normalization, improving
performance, or developing and enforcing standards, not giving people
database access or recreating databases and refreshing test data. In this
section, you'll see how you can automate these repetitive tasks so both
the DBA's and the team's time is spent on improving the efficacy and
____________________________________________________________________________
TABLE 5-1 Repeatable Database Integration Activities
| Activity |
Description |
Drop database
|
Drop the database and remove the associated data so that
you can create a new database with the same name. |
Create database
|
Create a new database using Data Definition Language (DDL). |
Insert system data
|
Insert any initial data (e.g., lookup tables) that your system
is expected to contain when delivered. |
Insert test data
|
Insert test data into multiple testing instances. |
Migrate database and data
|
Migrate the database schema and data on a periodic basis (if
you are creating a system based on an existing database). |
Set up database instances in multiple environments
|
Establish separate databases to support different versions
and environments. |
Modify column attributes and constraints
|
Modify table column attributes and constraints based on
requirements and refactoring. |
Modify test data
|
Alter test data as needed for multiple environments. |
Modify stored procedures(along with functions and triggers)
|
Modify and test your stored procedures many times during
development (you typically need to do this if you are using
stored procedures to provide behavior for your software). |
Obtain access to different environments
|
Log in to different database environments using an ID, password,
and database identifier(s). |
Back up/restore large data sets
|
Create specialized functions for especially large data sets or
entire databases. |
____________________________________________________________________________
efficiency of the database—not on simple administration. Table 5-1
identifies database integration activities typically performed by a
project member that can be automated.
Once you have automated these database-related tasks, you'll find
yourself solving problems just by dropping and creating a database
followed by inserting test data. This chapter's examples utilize Ant,
but the principles apply to any build platform that supports communicating
with a database. If your build platform is NAnt, Rake, or Maven,
you can do the same things this chapter demonstrates. Listing 5-1 executes
a series of SQL statements to create a database including its related
tables, comments, constraints, and stored procedures. The script also
applies test data for the given environment, such as development or
QA. Using this process, you can simply type ant db:prepare from
the command line and the build process will perform the tasks outlined
in Table 5-1. If you'd like to see this same process using other tools,
like NAnt or Maven, we've provided additional examples at the book's
associated Web site.
LISTING 5-1 build-database.xml: Automating Database Integration
Using Ant
____________________________________________________________________________
> ant –f build-database.xml db:prepare
Buildfile: build-database.xml
db:create:
[sql] Executing file: data-definition.sql
[sql]8 of 8 SQL statements executed successfully
db:insert:
[sql] Executing file: data-manipulation.sql
[sql] 60 of 60 SQL statements executed successfully
BUILD SUCCESSFUL
Total time: 20 seconds
 |
| About the authors: |
| Paul Duvall is the CEO of Stelligent, a firm that helps clients create production-ready software every day. Duvall contributed a chapter to No Fluff, Just Stuff Anthology: The 2007 Edition. Stephen M. Matyas III is the vice president of AutomateIt, a service branch of 5AM Solutions. Matyas is an expert in applied software engineering and Java. Andrew Glover is the president of Stelligent Incorporated and contributes to online publications such as IBM's developerWorks and Oreilly's ONJava and ONLamp portals. |
|
|
 |
 |
As you can see, using a single instruction from the command line
enables the execution of SQL scripts that define (db:create) and
manipulate a database (db:insert). We describe each of these tasks in
more detail in subsequent sections.
Figure 5-2 shows the steps to automate your database integration.
The following sections present a discussion of each component in
Figure 5-2.
Creating Your Database
To automate database integration, you must first create a database. In
this script, you typically drop and recreate the database, enforce data
integrity through constraints and triggers, and define database behaviour
Figure 5-2 The sequence of automated database integration
through stored procedures or functions. We are using Ant to automate
the execution of this process in Listing 5-2; however, as
mentioned earlier, you can also use make, shell, batch, Rake, Ruby, or
any number of tools. Notice that Ant provides a task to execute a SQL
script via the sql task. Using a build platform like Ant allows you to
perform the database integration activities using a sequential approach
and enforce dependencies on other targets (a set of tasks) in the script.
The example in Listing 5-2 demonstrates the use of Ant's sql
attributes, such as driver, userid, and password, to connect to the
database.
LISTING 5-2 build-database.xml: Defining Your Database Using an
Ant Script
____________________________________________________________________________
<target name="db:create" depends="filterSqlFiles" description="Create
the database definition">
<sql
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost:3306/"
userid="root"
password="root"
classpathref="db.lib.path"
src="${filtered.sql.dir}/database-definition.sql"
delimiter="/ /"/>
</target>
____________________________________________________________________________
Create Reusable Scripts
When you are writing a script that you plan to reuse, you can
define the attributes in a single file so that you only need to
define them one time for use in all of your manual and automated
scripts, rather than every time you use these attributes.
____________________________________________________________________________
In Listing 5-3, data-definition.sql is the SQL script that's called by
the Ant script in Listing 5-2. We're using a MySQL database in this
example, so some of the commands are MySQL-dependent. The datadefinition.
sql file is responsible for creating the database and its tables,
enforcing data integrity, and applying stored procedures. The following
is a typical order for this creation process.
- Database and permissions
- Tables
- Sequences
- Views
- Stored procedures and functions
- Triggers
The order of creation within your DDL statements may vary based
on database object dependencies. For example, you may have a function
that depends on a view, or vice versa, so you may need to list the
view first, for example.
LISTING 5-3 data-definition.sql: Sample Database Definition Script
for MySQL
____________________________________________________________________________
DROP DATABASE IF EXISTS brewery//
…
CREATE DATABASE IF NOT EXISTS brewery//
GRANT ALL PRIVILEGES ON *.* TO 'brewery'@'localhost' IDENTIFIED BY
'brewery' WITH GRANT OPTION//
GRANT ALL PRIVILEGES ON *.* TO 'brewery'@'%' IDENTIFIED BY 'brewery'
WITH GRANT OPTION//
USE brewery//
…
CREATE TABLE beer(id BIGINT(20) PRIMARY KEY, beer_name VARCHAR(50),
brewer VARCHAR(50), date_received DATE);
CREATE TABLE state(state CHAR(2), description VARCHAR(50));//
…
CREATE PROCEDURE beerCount(OUT count INT)
BEGIN
SELECT count(0) INTO count FROM beer;
END
//
____________________________________________________________________________
Technically Speaking…
You may find it easier to organize your targets and scripts by
database definition type (such as a table, view, and function) or
by subsystem (e.g., Property and Application).
____________________________________________________________________________
Manipulating Your Database
Once you've created a database from a build script, you'll need to provide
initial data (e.g., lookup tables) and test data for testing code that relies
on the database. This is where you supply the test data for your particular
environment or testing context. What's more, you may also find yourself
needing to use different SQL data files to support different environments,
like development, test, QA, and production environments.
The example in Listing 5-4 shows an Ant script pointing to a SQL
file, whose contents are inserted as test data into a database.
LISTING 5-4 build-database.xml: Manipulating Your Database Using an
Ant Script
____________________________________________________________________________
<target name="db:create" depends="filterSqlFiles" description="Create
the database definition">
<sql
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost:3306/brewery"
userid="brewery"
password="brewery"
classpathref="db.lib.path"
src="${filtered.sql.dir}/database-manipulation.sql"
delimiter=";"/>
</target>
The SQL script in Listing 5-5 represents test data. This is the script
that is referenced in Listing 5-4. In a typical script, you'll have many
more records than the three shown in Listing 5-5. Our intent is to give
you an idea of what the SQL scripts often execute. Tools like DbUnit
and NDbUnit4 can help seed the data that is inserted into and deleted
from a database as well.
LISTING 5-5 data-manipulation.sql: Sample Database Manipulation
Script for MySQL
____________________________________________________________________________
INSERT INTO beer(id, beer_name, brewer, date_received) VALUES (1,
'Liberty Ale','Anchor Brewing Company','2006-12-09');
INSERT INTO beer(id, beer_name, brewer, date_received) VALUES (2,
'Guinness Stout','St. James Gate Brewery','2006-10-23');
INSERT INTO state (state, description) VALUES('VT','Vermont');
INSERT INTO state (state, description) VALUES('VA','Virginia');
INSERT INTO state (state, description) VALUES('VI','Virgin Islands');
To achieve the benefits of automated database integration, you'll
need to provide scripts for inserting, updating, and deleting data.
These data manipulation scripts execute as part of an overall build process.
Next, we discuss how to tie these scripts together with the
orchestration script.
Creating a Build Database Orchestration Script
A database integration orchestration script executes the DDL and Data
Manipulation Language (DML) statements. Listing 5-6 shows an Ant
script that uses the sql task to call the data-definition.sql and datamanipulation.
sql files we created in Listing 5-3 and Listing 5-5. You'll
incorporate this orchestration into your higher-level build and integration
processes.
LISTING 5-6
build-database.xml: Database Integration Orchestration
Script Using Ant
____________________________________________________________________________
<target name="db:prepare" depends="db:create, db:insert"/>
<target name="db:create">
…
<target name="db:insert" depends="filterSqlFiles">
…
Are You on Autopilot?
As you are automating your database integration, a few things
may trip you up. It's easy for manual activities to unintentionally
accumulate in your database integration process. Try to resist this.
As Andrew Hunt and David Thomas mention in The Pragmatic
Programmer : Don't Repeat Yourself (or DRY, for short), keep your
build scripts "DRY." An easy form of "duplication" to miss is when
we get acclimated to clicking through the database vendor's GUI
application wizard rather than interfacing through the command
line where it can run scripted. Another potential problem is the tendency
to wait until there are many DDL/DML changes before committing
back to the version control repository. Database changes
can be pervasive, so try to make and check in small, incremental
changes to your database; this will make it easier to test and
debug.

Automated database integration
How to automate database integration
Using continuous database integration and a database sandbox
This chapter is an excerpt from the book, Continuous Integration: Improving Software Quality and Reducing Risk, authored by Paul Duvall, Steve Matyas, Andrew Glover. Published by Addison-Wesley Professional, June 2007. Copyright 2007. Pearson Education Inc., ISBN 0321336380. For an additional sample chapter, please visit Addison-Wesley. Safari Books Online subscribers can access the book here.