Effectively using Associative Tables in Relational Database Design


When creating entities in an entity-relational (ER) diagram, there are times where multiple entity attributes (also known as fields in tables) have associations to could create redundancy in the entity instances of a relational database model.  An associative entity is an entity type that associates the instances of one or more entity types that contains attributes in the relationships between those entities.   Associative entities  helps prevent joining two or more entities directly with multiple relationships that would create duplication of instances.  In databases this can sometime result in Cartesian joins and can produce duplication in table rows; sometimes many times over in a result set.

In entities where the instances are not very unique, such as the days of the week, months in a year, and positions in a company; it’s not a good idea to create a direct association with attributes on that entity to another entity that has very unique instances.  For example, meetings in a company, full names, age and address of people in New York City, or people working in a company are examples of uniqueness.  Although these are a very basic example, you will soon discover that you could almost create a level of duplication that can exponentially grow the number of instances in the design of your database model.

Below is an use case for an associative entity.  The EMPLOYEE  and SKILLS tables are the entities that contain the employee information and types of skills and titles within the company, respectively.  In this example a company ranks it’s employee’s skills by titles such as Principal, Senior, III, II,  and I.  However, they also want to know which skills are managed by which employees.  Since we want to prevent as much duplication as possible, we create an associative table called ORGANIZATION that will have a primary identifier attribute (ORG_ID) and joins together the EMPLOYEE and SKILLS table. The MANAGED_SKILL_ID attribute would create a composition identifier that would allow an employee to have multiple instances of the skills for which he or she would manage without duplicating data in the EMPLOYEE or SKILLS entities.



Leveraging Memory in PL/SQL for Big Data Applications

When writing PL/SQL code for Big Data applications it’s always a good idea to leverage the database as much as possible.  This is particularly true for ETL or warehousing processing.

In my experience, I’ve tried to quantify objects in Oracle programming that work best to temporarily  store data.  As part of your Big Data application, anytime you need to leverage relational databases, here is a table that I typically use as a guideline for storing data for loading to or extracting data from a database.

Typically, I judge the what type of cursor object to used based on factors such as performance, the number of rows that need to be read, the operation that I am performing and how much memory that I would like to leverage.

Number of Rows Explicit Cursor  Temporary Table PL/SQL Memory Collections Record Collections  Typical Memory Used
<= 1000 check_mark7 check_mark7 10 MB
1000 to 500,000 check_mark7 check_mark7 check_mark7 100-500 MB
500,00 to 2,000,000+ check_mark7 check_mark7 500MB +

Although, explicit cursors can be used for much larger data sizes, if performance is your goal, a few thousand rows are optimal to maintain good performance, especially if you are opening and closing many cursors in a small window of time.