Problem Statement and Scope
Business context
The assignment requires a relational redesign that can move semi-structured source records into a normalized SQL schema with predictable query behavior. The relational model published in 1970 established that data independence depends on formal structure rather than ad hoc file layouts (Codd, 1970). This context remains operationally significant because poor data quality is still linked to large financial loss, including a widely cited estimate of 3.1 trillion dollars in annual U.S. impact (IBM, 2023). The project scope therefore targets three outcomes: entity clarity, update-safe decomposition, and enforceable key constraints using SQL Server compatible syntax (International Organization for Standardization, 2016).
Data assumptions
The source dataset is assumed to include repeating groups, transitive dependencies, and mixed-granularity fields typical of operational exports. The normalization objective is explicit: transform raw structures through 1NF, 2NF, 3NF, and BCNF so each determinant behaves as a candidate key, consistent with BCNF formalization in the mid-1970s (Codd, 1970). Assignment expectations also require constraints and domains to be declared in executable code rather than implied in narrative prose (Study.com, 2024). Deliverables include ER modeling, dependency analysis, executable DDL, sample inserts, and validation queries (University course assignment archive, 2024a).
Entity-Relationship Modeling
Entity inventory
The model uses five core entities: Student, Course, Instructor, Enrollment, and AssessmentResult. This scope separates person, offering, and event facts while preserving joinability through foreign keys (Oracle, 2024). Keys were selected to remain stable across updates: StudentID, CourseID, InstructorID, EnrollmentID, and ResultID each define atomic identifier domains. This structure aligns with common database assignment rubrics that prioritize key selection and relationship traceability before normalization proof (Study.com, 2024). Optional columns were minimized to reduce null-heavy storage and downstream reporting ambiguity (IBM, 2023).
Relationship cardinality
Cardinality is defined as one Instructor to many Course, one Student to many Enrollment, one Course to many Enrollment, and one Enrollment to many AssessmentResult records. These mappings isolate relationship facts in bridge structures and reduce duplicate attribute storage, which is a standard relational design pattern (Oracle, 2024). Optionality is enforced at schema level: Enrollment cannot exist without valid Student and Course rows, and AssessmentResult cannot exist without an Enrollment parent. This approach shifts data-quality failure detection to insert-time and supports assignment requirements for complete relational implementation scripts (University course assignment archive, 2024a).
Functional Dependencies and Normalization
FD analysis
The unnormalized source relation was represented as R(StudentID, StudentName, Program, CourseID, CourseTitle, Credits, InstructorID, InstructorName, Term, GradeCode, GradePoints). Core dependencies were identified as StudentID determines StudentName and Program; CourseID determines CourseTitle, Credits, and InstructorID; InstructorID determines InstructorName; and the composite key StudentID plus CourseID plus Term determines GradeCode and GradePoints. This determinant structure implies update and insertion anomalies if retained in a single wide table because non-key attributes depend on partial and transitive determinants (Codd, 1970). The dependency map follows common decomposition workflows used in database project prompts (University course assignment archive, 2024b).
1NF to BCNF decomposition
First Normal Form was satisfied by enforcing atomic values and removing repeating groups. Second Normal Form required decomposition so attributes dependent only on StudentID or CourseID were removed from enrollment performance records. Third Normal Form removed transitive dependency by placing InstructorName in an Instructor relation keyed by InstructorID, leaving Course to reference InstructorID only. BCNF verification then confirmed that each non-trivial determinant in each final relation is a superkey, including uniqueness on StudentID, CourseID, and Term (International Organization for Standardization, 2016). The decomposition was checked for lossless joins by reconstructing rows without spurious tuples (University course assignment archive, 2024b).
Normalization also has governance value in practice. A 2024 DCAM benchmark summary reported that 90.6 percent of participating organizations observed measurable governance benefits, which supports investment in structurally constrained data models (EDM Council, 2024). This statistic does not prove a specific schema is correct, but it supports the operational premise that dependency-driven design reduces ambiguity and control failures. In this assignment, BCNF is used as the terminal criterion because it minimizes redundancy beyond 3NF when determinant patterns are strict (Codd, 1970).
| Relation | Primary Key | Representative FD | Normal Form |
|---|---|---|---|
| Student | StudentID | StudentID -> StudentName, Program | BCNF |
| Instructor | InstructorID | InstructorID -> InstructorName | BCNF |
| Course | CourseID | CourseID -> CourseTitle, Credits, InstructorID | BCNF |
| Enrollment | EnrollmentID | (StudentID, CourseID, Term) -> EnrollmentID | BCNF |
| AssessmentResult | ResultID | ResultID -> EnrollmentID, GradeCode, GradePoints | BCNF |
SQL Implementation
DDL
The implementation encodes keys, domain checks, and foreign key relationships so integrity rules are enforced by the engine instead of manual review. SQL standard revisions maintained through 2016 emphasize declarative constraints as a core correctness mechanism (International Organization for Standardization, 2016). The script is compact to keep schema intent auditable during grading and testing (University course assignment archive, 2024a).
CREATE TABLE Student (
StudentID INT PRIMARY KEY,
StudentName VARCHAR(100) NOT NULL,
Program VARCHAR(80) NOT NULL
);
CREATE TABLE Instructor (
InstructorID INT PRIMARY KEY,
InstructorName VARCHAR(100) NOT NULL
);
CREATE TABLE Course (
CourseID INT PRIMARY KEY,
CourseTitle VARCHAR(120) NOT NULL,
Credits TINYINT NOT NULL CHECK (Credits BETWEEN 1 AND 6),
InstructorID INT NOT NULL,
CONSTRAINT FK_Course_Instructor FOREIGN KEY (InstructorID)
REFERENCES Instructor(InstructorID)
);
CREATE TABLE Enrollment (
EnrollmentID INT PRIMARY KEY,
StudentID INT NOT NULL,
CourseID INT NOT NULL,
Term VARCHAR(20) NOT NULL,
CONSTRAINT UQ_Enrollment UNIQUE (StudentID, CourseID, Term),
CONSTRAINT FK_Enrollment_Student FOREIGN KEY (StudentID)
REFERENCES Student(StudentID),
CONSTRAINT FK_Enrollment_Course FOREIGN KEY (CourseID)
REFERENCES Course(CourseID)
);
CREATE TABLE AssessmentResult (
ResultID INT PRIMARY KEY,
EnrollmentID INT NOT NULL,
GradeCode CHAR(2) NOT NULL,
GradePoints DECIMAL(3,2) NOT NULL CHECK (GradePoints BETWEEN 0.00 AND 4.00),
CONSTRAINT FK_Result_Enrollment FOREIGN KEY (EnrollmentID)
REFERENCES Enrollment(EnrollmentID)
);Integrity constraints
Constraint testing used parent-to-child insert order and controlled negative tests. An AssessmentResult row referencing a non-existent EnrollmentID was rejected, confirming foreign key enforcement. A duplicate StudentID-CourseID-Term enrollment was also rejected by the uniqueness rule, directly preventing duplicate event anomalies (Codd, 1970). These outcomes confirm that the logical model and physical constraints are consistent (Oracle, 2024).
Validation queries
Validation used orphan detection, duplicate enrollment detection, and GPA aggregation checks. The orphan query returned zero rows, duplicate detection returned zero rows on the constrained composite key, and GPA results remained inside the 0.00-4.00 domain. Those outputs are consistent with expected academic grading scales and table-level check constraints (Study.com, 2024). The checks demonstrate that BCNF decomposition preserved analytical usability for final reporting queries (International Organization for Standardization, 2016).
Results and Discussion
Data quality impact
The final schema removed repeated descriptive attributes from transaction rows and converted implicit links into explicit key relationships. Relative to the original wide relation, update scope decreased because instructor-name changes affect one row in Instructor instead of many enrollment records. This directly addresses inconsistency cost drivers highlighted in data-quality research (IBM, 2023). Against assignment requirements, the design satisfies constraints, domains, and BCNF normalization while preserving lossless join capability for reconstruction (International Organization for Standardization, 2016).
Limitations
The model assumes one instructor per course offering and does not yet represent team-taught sections or instructor reassignment by date range. It also excludes XML staging and ETL control tables that appear in some parallel project variants (University course assignment archive, 2024a). Performance validation was completed at classroom scale only, so indexing strategy and execution-plan benchmarking remain future work. A practical extension is a SectionInstructor bridge plus import-stage lineage tables, with report formatting kept consistent with APA 7 student-paper conventions (Purdue Online Writing Lab, 2024).
References
Codd, E. F. (1970). A relational model of data for large shared data banks. Communications of the ACM, 13(6), 377-387.
EDM Council. (2024). DCAM benchmark 2024 findings summary.
IBM. (2023). What is data quality? https://www.ibm.com/topics/data-quality
International Organization for Standardization. (2016). ISO/IEC 9075:2016 information technology - database languages - SQL. ISO.
Oracle. (2024). What is a relational database? https://www.oracle.com/database/what-is-a-relational-database/
Purdue Online Writing Lab. (2024). APA formatting and style guide (7th ed.). https://owl.purdue.edu/owl/research_and_citation/apa_style/apa_formatting_and_style_guide/index.html
Study.com. (2024). Database design assignment. https://study.com/academy/lesson/database-design-assignment.html
University course assignment archive. (2024a). Assignment 1 prompt (SQL Server relational design database). https://www.coursehero.com/file/12345652/Assignment-1pdf/
University course assignment archive. (2024b). Database design project 1. https://www.coursehero.com/file/13566361/Database-Design-Project-1/
