Navigation
SYSTEM ANALYSIS AND DESIGN Data Dictionary
Unit 1: Data Dictionary
1. What is a Data Dictionary?
A Data Dictionary (also called data repository or metadata repository) is a centralized collection of definitions for all data elements, data flows, data stores, records, and processes within a system.
Analogy: If a DFD is a map showing roads (data flows) and buildings (processes/data stores), the Data Dictionary is the legend and address book that explains what each symbol means, what data travels on each road, and the structure of each building.
1.1 Why is a Data Dictionary Needed?
| Problem without Data Dictionary | Solution with Data Dictionary |
|---|---|
| Ambiguous data names (“customer” means different things to sales and billing) | Single, agreed definition. |
| Duplicate data elements stored inconsistently | Shows where data is stored and its format. |
| Missing or incorrect data flows in DFD | Data flows must be defined in dictionary. |
| Difficulty estimating storage needs | Dictionary records data lengths and volumes. |
| Inconsistent field attributes | Standardized data lengths, types, and validations. |
| Onboarding challenges | Provides a complete reference for new team members. |
1.2 When is a Data Dictionary Created?
- During analysis: As DFDs are drawn, each data flow and data store is defined.
- Maintained throughout design: Evolving as technology choices are made.
- Delivered as documentation: Part of the final SRS for the operations and maintenance teams.
2. Core Components of a Data Dictionary
A complete data dictionary defines four types of entries:
| Entry Type | Definition | Example |
|---|---|---|
| Data Element | Smallest unit of data (atomic); cannot be broken down further. | Customer_ID, Order_Date, Quantity |
| Data Flow | A collection of data elements moving together. | Order (contains Customer_ID, Order_Date, {Product}) |
| Data Store | A collection of data records (e.g., file, database table). | Customer_File (contains many Customer records) |
| Process | A transformation; includes input flows, output flows, and logic description. | Calculate_Tax (inputs: Subtotal, Tax_Code; outputs: Tax_Amount) |
3. Data Element Definition
A data element (also called field, attribute, data item) is the most detailed level of data.
3.1 Attributes to Document for Each Data Element
| Attribute | Description | Example |
|---|---|---|
| Name | Unique identifier, often in uppercase with underscores | CUSTOMER_ID |
| Alias | Other names used in the organization | CustID, ClientNumber |
| Type | Data type: alphanumeric, numeric, date, boolean, etc. | Numeric (integer) |
| Length | Maximum number of characters or digits | 10 (for CUSTOMER_ID) |
| Range / Format | Valid values or pattern | 10000–99999; format: 5-digit number |
| Default value | Value if none provided | 0 for QUANTITY |
| Null allowed? | Can this field be empty? | No (NOT NULL) |
| Validation rule | Business rule for validity | Must exist in Customer master |
| Source | Where does this data originate? | User input from registration screen |
| Derivation | Formula if calculated | TOTAL = PRICE * QUANTITY |
3.2 Example Data Element Definitions
| Name | Type | Length | Range | Validation | Source |
|---|---|---|---|---|---|
| CUSTOMER_ID | Alphanumeric | 8 | [A-Z]{2}[0-9]{6} | First 2 chars = state code | Registration form |
| ORDER_DATE | Date | 8 (YYYYMMDD) | Not future date | <= current system date | System timestamp |
| QUANTITY | Integer | 3 | 1–999 | > 0 | User entry |
| UNIT_PRICE | Decimal (10,2) | 10 total | 0.01–9999.99 | > 0 | Product file |
4. Data Flow Definition
A data flow is a collection of data elements that moves together from one symbol to another in a DFD.
4.1 Notation for Describing Data Flow Structure
| Symbol | Meaning | Example |
|---|---|---|
| = | is composed of | ORDER = ORDER_HEADER + ORDER_LINES |
| + | and (concatenation) | NAME = FIRST_NAME + LAST_NAME |
| { } | repetition (0 or more) | {PRODUCT} means zero or more PRODUCTs |
| **[ | ]** | selection (one of many) |
| ( ) | optional | DISCOUNT_CODE ( ) means optional |
| *** *** | comment | * This is a comment * |
4.2 Example Data Flow Structures
Example 1: ORDER data flow
ORDER = ORDER_ID + CUSTOMER_ID + ORDER_DATE + {ORDER_LINE} + TOTAL_AMOUNT
ORDER_LINE = PRODUCT_CODE + QUANTITY + LINE_TOTAL
LINE_TOTAL = QUANTITY * UNIT_PRICE (derived)
Example 2: CUSTOMER_QUERY
CUSTOMER_QUERY = [CUSTOMER_ID | LAST_NAME + FIRST_NAME] + (PHONE_NUMBER)
Interpretation: Query by ID OR by name, with an optional phone number.
4.3 Documenting Data Flow Attributes
For each data flow, also document:
- Name: Name as it appears on DFD.
- Composition: Using the structural notation above.
- Frequency: How often does this flow occur? (e.g., 1000 per day).
- Volume: Peak number of occurrences (e.g., 200 per hour).
- Source & Destination: Which processes or entities send/receive it?
5. Data Store Definition
A data store represents a repository of data – typically a file, database table, or collection of records.
5.1 Attributes for Data Store
- Name & Alias: Name on DFD.
- Composition: What records does it contain? (e.g.,
Customer_File = {Customer_Record}). - Organization: Sequential, indexed, relational table, etc.
- Key(s): Primary and secondary keys.
- Volume: Number of records (current and projected growth).
- Retention: How long data is kept (e.g., 7 years).
- Security: Who can read/write? (access control).
5.2 Example Data Store Definition
Data Store: CUSTOMER_FILE
- Composition:
CUSTOMER_FILE = {CUSTOMER_RECORD} - Structure:
CUSTOMER_ID + NAME + ADDRESS + PHONE + EMAIL + CREDIT_LIMIT - Organization: Indexed on
CUSTOMER_ID(Primary Key). - Volume: 50,000 records; growth 5,000/year.
- Security: Sales read/write; Accounting read-only.
6. Process Definition
In a data dictionary, process definitions describe the logic that transforms input data flows into output data flows.
6.1 Process Documentation Attributes
- Process name/number: As shown on DFD (e.g., 2.3 "Calculate Tax").
- Input/Output flows: List all flows entering and leaving the process.
- Data stores accessed: Read/write operations.
- Logic description: Structured English, decision tables, or pseudocode.
- Performance: Frequency, volume, and required response time.
6.2 Example Process Definition
Process 2.3: Calculate Tax
- Inputs:
SUBTOTAL,TAX_CODE - Outputs:
TAX_AMOUNT - Logic (Structured English):
IF TAX_CODE = "EXEMPT" THEN
TAX_AMOUNT = 0
ELSE
Look up RATE from Tax_Rate_Table where TAX_CODE matches
TAX_AMOUNT = SUBTOTAL * RATE
ENDIF
- Response time: Required < 0.1 seconds.
7. Relationship Between DFD and Data Dictionary
The DFD and Data Dictionary are inseparable.
| DFD Element | Data Dictionary Provides |
|---|---|
| Data flow arrow | Composition (list of elements) and attributes. |
| Data store symbol | Record structure, volume, and organization. |
| Process bubble | Detailed logic and performance requirements. |
Rule: Every data flow, data store, and process shown on a DFD must have an entry in the data dictionary.
8. Creating a Data Dictionary – Step-by-Step
- List data flows from DFD: Start from Context and decompose.
- Define composition: Use structural notation to break flows into elements.
- Define atomic elements: Specify types, lengths, and validation rules.
- Define data stores: Specify records, keys, and organization.
- Define processes: Specify the transformation logic (pseudocode).
- Cross-check: Ensure every element and store access is documented.
9. Data Dictionary Tools
| Tool Type | Examples | Features |
|---|---|---|
| Spreadsheets | Excel, Google Sheets | Simple, but lacks automated integrity checking. |
| Databases | MS Access, SQL | Supports relationships and forms. |
| CASE Tools | Visible Analyst, ERwin | Integrated DFD and dictionary management. |
| Professional Documentation | Dataedo, Confluence | High-quality reports and collaboration. |
10. Key Takeaways
- The "Glossary": The Data Dictionary is the single source of truth for symbols on a DFD.
- Atomic Fields: Define smallest units before building complex flows.
- Consistent Notation: Use
+,{},[],()to avoid ambiguity. - Living Document: Must be updated whenever the system analysis changes.
- Complementary: Works with DFDs (movement) and ERDs (structure).
