Sunday, January 23, 2011

Entity-Relationship Modeling

Entity-Relationship Diagrams (ERDs) attempt to accomplish two things early in the design of a database. First they document the entities of interest to the enterprise. Entities may represent things (such as a customer, an
employee, or a product) or an event (like a registration, a shipment, or an order). Entities have identifying
characteristics and additional descriptive characteristics. For instance, an employee has an employee number,
a name, a home address, a work location, a supervisor, a hire date, a pay rate, and so on. One of the challenges
during design will be sorting through all of the available data “thingies” to figure out which are entities, which
are identifiers of entity instances, and which are descriptive of particular instances of an entity.
In addition to documenting entities, an ERD documents the relationship between entities. From the list
of example entities in the previous paragraph, you might conclude that there is a relationship between a
customer and a product. It might be direct (a CUSTOMER owns a PRODUCT) or it might be more complex
set of relationships:
❑ each CUSTOMER may place one or more ORDERs
❑ each ORDER must include one or more PRODUCTs
❑ each SHIPMENT must include at least one PRODUCT to one and only one CUSTOMER
❑ each ORDER may be fulfilled with one or more SHIPMENTs
❑ a SHIPMENT may include parts of one or more ORDERs for one and only CUSTOMER

This list is an example of the relationship rules that might exist for a company. Another company with
the same entities may have very different relationships and rules. For instance, an alternative relationship
might be that “a SHIPMENT fulfills one and only one ORDER”. Sorting through these relationships and the
business rules that define them is the essence of designing a database that will meet the requirements of
a specific business. Getting one rule wrong during logical design can lead to a database that will not be
able to support necessary business processes.
Setting aside the order entry discussion, see Figure 4-1 for a quick illustration of the ERD modeling
technique for documenting a logical database design that might be part of a (highly simplified) insur-
ance billing system. Each box designates an entity . . . an object of importance to the enterprise. The ERD
doesn’t attempt to show all of the low-level data elements (attributes) associated with each entity. At
most, identifying elements (keys) are shown. A total of seven entities have been shown in this example.

While we might have shown INSURED and POLICY_OWNER as completely distinct entities, this model
assumes that there are many characteristics of the two that overlap, so they have been shown as two
subtypes of the PERSON entity.

As we said, the second aspect of an ERD is to document the relationships between entities, as shown by
the connecting lines. One line is drawn for each relationship; every entity should be expected to have at
least one relationship to some other entity, but not all entities are necessarily related to every other entity.
It is possible for two entities to have more than one relationship.

Further, it should be recognized that every relationship exists in two directions. If entity Ahas a relationship
to entity B, then entity B has a relationship to entity A. Asingle line defines this bidirectional relationship,
but it is useful to define the relationship in both directions. Many methodologies and tools are not strict
in this regard, but it helps when reading an ERD to be able to traverse the relationships without having
to translate on the fly. In Figure 4-1, the relationship between POLICY and CLAIM entities can be stated in
two ways: (1) A POLICY may have one or more CLAIMs submitted and (2) A CLAIM must be submitted
against exactly one POLICY.
The nature of each relationship may be optional or mandatory in one direction or both. In this statement
of the bi-directional relationship between the POLICY and CLAIM entities, the term “MUST” indicates
that the relationship is mandatory in that direction (a CLAIM cannot exist without an associated POLICY.)
A solid line in the ERD indicates that the relationship, in that direction, is mandatory.
An optional relationship exists in the other direction—the term “MAY” informs us that a POLICY is
allowed to exist without any associated CLAIMs. In the ERD, a broken line indicates that end of the line
to indicate that the relationship from that direction is optional.
The degree of each relationship, additionally, may be a one-to-one correspondence between the entities
or, more commonly, a one-to-many relationship. Where many occurrences of an entity may exist for a
relationship, a “crow’s foot” is used to designate the many end of the one-to-many relationship, such as
the fact that multiple CLAIMs are allowed for a POLICY. The written description of the relationship uses the
“one or more” phrase to designate this.
Where only one occurrence of an entity may participate in the particular relationship, no crow’s foot is
used and we’ll describe the relationship using “one and only one” or “exactly one” terminology.

Many-to-many relationships do occur in the real world, and during this logical design phase, they may
appear in our ERD. Figure 4-1 showed such a relationship between CLAIM and TREATMENT. Many-to-many
relationships, however, present difficulties in creating a physical database. Remember that a relational
database makes its connection between tables using data values themselves rather than pointers. We will
eventually, during physical design, implement a one-to-many relationship by carrying the key value of
the “one” side as a data value stored within each row on the “many” side table. Many-to-many relationships
can’t be stored that way, so data modelers will resolve the many-to-many relationship into two one-to-
many relationships and by creating a new connecting entity to help define the relationship. This issue
was ignored in the preliminary example, but Figure 4-2 shows that specific portion of our total ERD with
this relationship resolved.

Another common construct included within an ERD are “Type of” subentities exemplified by INSURED
and POLICY_OWNER that have been shown as subtypes of the PERSON entity. At a later revision of this
ERD, some differences in types of PROVIDERs might have been discovered that would lead to a similar
subtyping for doctors, hospitals, physical therapists, and pharmacies.
One other ERD diagramming convention shows “one of” choices among multiple relationships. This option
is shown by the arc crossing the relationship lines connecting the CLAIM entity to DENIAL and PAYMENT with
an arc. According to our simple ERD, either (but not both) of these may exist for any given CLAIM.
The primary advantage of ERD diagramming is that ERDs are easily explained to nontechnical users and
sponsors of a proposed system. Most business people can easily validate the important entities, their identifying
attributes, and the rules that define the relationships between the entities in their real world. Asecond advan-
tage is that ERDs are very easily translated from their logical representation to a preliminary physical design of
a relational database. It is, after all, a relational technique. The third advantage of ERD modeling is that ERDs
can scale to show the total information model of an entire enterprise or a restricted view can present just the
subset of entities of interest to a particular system. (Pulling a small part of the entire ERD, as in Figure 4-2, is an
example of this capability.) This allows enterprise data architects a wonderful way to “zoom” in and out to
examine global relationships or detailed implications of a proposed change in one particular program.
The disadvantage of ERD modeling is that it is primarily a data-focused methodology and doesn’t
attempt to model the processing needs of the application. Processing will have to be designed to support
each relationship, but the ERD doesn’t suggest how that processing will be handled. ERDs, as valuable
as they are, require a complementary design and documentation technique to handle processing needs
and module decomposition.

No comments:

Post a Comment