Navigation
SYSTEM ANALYSIS AND DESIGN File Organization and Database Design
Unit 1: File Organization and Database Design
1. Introduction
Every information system needs to store and retrieve data efficiently. The choice of how to organize data on storage media (disk, SSD) affects:
- Access speed (how fast we find a record)
- Storage efficiency (how much space is used)
- Flexibility (ability to change or query data in different ways)
- Concurrency (multiple users accessing simultaneously)
Two broad approaches:
- File-based systems – Data stored in operating system files, managed by application programs. (Legacy or simple systems.)
- Database systems – Data stored in a structured, integrated repository managed by a Database Management System (DBMS). (Modern approach for most business systems.)
Key idea: File organization is about physical arrangement of records on disk. Database design is about logical structure and relationships among data.
2. File Organization
File organization refers to the way records are physically arranged inside a file on secondary storage. The choice determines which access methods (sequential, direct, indexed) are efficient.
2.1 Basic Concepts
- Record – A collection of related data fields (e.g., one customer).
- File – A collection of records (e.g., all customers).
- Block – The smallest unit of data read/written from disk (typically 512 bytes to 64 KB).
- Key field – A field used to identify a record (e.g.,
Customer_ID). - Access method – The algorithm used to locate a record.
![File Organization Types Placeholder]()
2.2 Types of File Organization
| Organization | Description | Access Methods | Best for |
|---|---|---|---|
| Sequential | Records stored one after another in order of a key field. | Sequential read (from start to end). | Batch processing, report generation. |
| Indexed Sequential (ISAM) | Sequential file plus an index that maps key to disk address. | Sequential or direct by key (via index). | Transaction processing with occasional direct lookups. |
| Direct / Random (Hashing) | Record address computed from key using a hash function. | Direct access by key. | Real‑time systems where speed is critical (e.g., airline reservations). |
| Clustered | Records from multiple related files stored together (e.g., customer and his orders). | Sequential or indexed. | Reducing disk I/O for joins. |
| Heap / Unordered | Records appended in order of arrival; no ordering. | Sequential scan only. | Log files, temporary data. |
2.3 Detailed Description of Each Organization
2.3.1 Sequential File Organization
- Structure: Records are stored contiguously, sorted by a key field (e.g., employee number).
- Access: To find a record, you must read from the beginning until the key matches.
- Operations: Insert/Delete requires rewriting the entire file. Update can be done in place if record size is fixed.
- Advantages: Simple, efficient for batch processing (e.g., payroll). High storage efficiency.
- Disadvantages: Slow for random access; poor for frequent inserts/deletes.
2.3.2 Indexed Sequential File Organization (ISAM)
- Structure: A sequential file (data records) plus one or more indexes. The index maps key values to disk addresses.
- Access: Search the index (fast, because index is small and ordered) then read that block.
- Advantages: Supports both sequential and random access efficiently.
- Disadvantages: Index adds storage and maintenance overhead.
![B-Tree Structure Placeholder]()
2.3.3 Direct / Hashing File Organization
- Structure: Records are placed in buckets (blocks) based on the output of a hash function:
Bucket_Address = hash(key) % number_of_buckets. - Collisions: When two keys hash to the same bucket. Handled by chaining or open addressing.
- Advantages: Extremely fast for exact‑key lookups (O(1) average).
- Disadvantages: Inefficient for range queries. Not sorted. Poor for sequential access.
![Hashing Mechanism Placeholder]()
2.3.4 Clustered File Organization
- Structure: Records from related tables are stored in the same disk blocks (e.g., a customer and all their orders).
- Advantages: Dramatically improves join performance.
- Disadvantages: May slow down operations on the “many” side alone.
2.4 Comparison Table
| Feature | Sequential | Indexed | Direct (Hash) | Clustered |
|---|---|---|---|---|
| Sequential access | Very fast | Fast | Slow | Fast |
| Random by key | Slow | Fast | Very fast | Fast |
| Range queries | Fast | Fast | Very slow | Depends |
| Storage overhead | None | Index size | None | May waste space |
3. Database Design
Database design deals with the logical structure of data, including entities, attributes, relationships, and constraints.
3.1 Design Phases
| Phase | Output | Description |
|---|---|---|
| Conceptual design | ER diagram (Entity‑Relationship) | Technology‑independent model of data requirements. |
| Logical design | Relational schema (tables, keys) | Transform ER diagram into tables; apply normalization. |
| Physical design | Storage specs (indexes, partitioning) | Optimize for performance on a specific DBMS. |
3.2 Conceptual Design (ER Modeling)
- Entity: A distinct object (e.g.,
Customer). - Attribute: A property of an entity (e.g.,
Name). - Relationship: An association (e.g.,
places).
![Library ER Diagram Placeholder]()
3.3 Normalization
Normalization is the process of eliminating data redundancy and anomalies (insert, update, delete) by decomposing tables.
![Normalization Steps Placeholder]()
| Normal Form | Rule | Problem it solves |
|---|---|---|
| 1NF | Atomic values; no repeating groups. | Repeating groups cause storage inefficiency. |
| 2NF | 1NF + every non‑key attribute depends on the whole primary key. | Partial dependencies cause redundancy. |
| 3NF | 2NF + no transitive dependency (depends only on the PK). | Update anomalies. |
4. File Organization vs. Database Design
| Aspect | File Organization | Database Design |
|---|---|---|
| Level | Physical | Conceptual / Logical |
| What it defines | How records are placed on disk. | Entities, relationships, constraints. |
| Tools | File system, ISAM, hashing. | DBMS (Oracle, MySQL, SQL Server). |
| Change frequency | Relatively stable (tuned). | Changes with business rules. |
5. Summary and Key Takeaways
- File organization is about physical storage and access methods. Choose based on access patterns (sequential, indexed, hashed).
- Database design is about logical structure (entities, normalization). Standard is 3NF for transactional systems.
- Indexes are the primary tool for speeding up queries but slow down writes.
- In modern SAD, most systems use a relational DBMS with B‑tree indexes, but understanding these principles is essential for performance tuning.
