SYSTEM ANALYSIS AND DESIGN Data Dictionary

Unit 1: Data Dictionary

1. What is a Data Dictionary?

A Data Dictionary (also called data repository or metadata repository) is a centralized collection of definitions for all data elements, data flows, data stores, records, and processes within a system.

Analogy: If a DFD is a map showing roads (data flows) and buildings (processes/data stores), the Data Dictionary is the legend and address book that explains what each symbol means, what data travels on each road, and the structure of each building.

Data Dictionary Analogy: Map vs Legend

1.1 Why is a Data Dictionary Needed?

Problem without Data DictionarySolution with Data Dictionary
Ambiguous data names (“customer” means different things to sales and billing)Single, agreed definition.
Duplicate data elements stored inconsistentlyShows where data is stored and its format.
Missing or incorrect data flows in DFDData flows must be defined in dictionary.
Difficulty estimating storage needsDictionary records data lengths and volumes.
Inconsistent field attributesStandardized data lengths, types, and validations.
Onboarding challengesProvides a complete reference for new team members.

1.2 When is a Data Dictionary Created?

  • During analysis: As DFDs are drawn, each data flow and data store is defined.
  • Maintained throughout design: Evolving as technology choices are made.
  • Delivered as documentation: Part of the final SRS for the operations and maintenance teams.

2. Core Components of a Data Dictionary

A complete data dictionary defines four types of entries:

Core Components of a Data Dictionary

Entry TypeDefinitionExample
Data ElementSmallest unit of data (atomic); cannot be broken down further.Customer_ID, Order_Date, Quantity
Data FlowA collection of data elements moving together.Order (contains Customer_ID, Order_Date, {Product})
Data StoreA collection of data records (e.g., file, database table).Customer_File (contains many Customer records)
ProcessA transformation; includes input flows, output flows, and logic description.Calculate_Tax (inputs: Subtotal, Tax_Code; outputs: Tax_Amount)

3. Data Element Definition

A data element (also called field, attribute, data item) is the most detailed level of data.

3.1 Attributes to Document for Each Data Element

Data Element Attribute Profile Card

AttributeDescriptionExample
NameUnique identifier, often in uppercase with underscoresCUSTOMER_ID
AliasOther names used in the organizationCustID, ClientNumber
TypeData type: alphanumeric, numeric, date, boolean, etc.Numeric (integer)
LengthMaximum number of characters or digits10 (for CUSTOMER_ID)
Range / FormatValid values or pattern10000–99999; format: 5-digit number
Default valueValue if none provided0 for QUANTITY
Null allowed?Can this field be empty?No (NOT NULL)
Validation ruleBusiness rule for validityMust exist in Customer master
SourceWhere does this data originate?User input from registration screen
DerivationFormula if calculatedTOTAL = PRICE * QUANTITY

3.2 Example Data Element Definitions

NameTypeLengthRangeValidationSource
CUSTOMER_IDAlphanumeric8[A-Z]{2}[0-9]{6}First 2 chars = state codeRegistration form
ORDER_DATEDate8 (YYYYMMDD)Not future date<= current system dateSystem timestamp
QUANTITYInteger31–999> 0User entry
UNIT_PRICEDecimal (10,2)10 total0.01–9999.99> 0Product file

4. Data Flow Definition

A data flow is a collection of data elements that moves together from one symbol to another in a DFD.

4.1 Notation for Describing Data Flow Structure

Data Dictionary Structural Notation Symbols

SymbolMeaningExample
=is composed ofORDER = ORDER_HEADER + ORDER_LINES
+and (concatenation)NAME = FIRST_NAME + LAST_NAME
{ }repetition (0 or more){PRODUCT} means zero or more PRODUCTs
**[]**selection (one of many)
( )optionalDISCOUNT_CODE ( ) means optional
*** ***comment* This is a comment *

4.2 Example Data Flow Structures

Data Flow Composition Example: ORDER

Example 1: ORDER data flow

ORDER = ORDER_ID + CUSTOMER_ID + ORDER_DATE + {ORDER_LINE} + TOTAL_AMOUNT
ORDER_LINE = PRODUCT_CODE + QUANTITY + LINE_TOTAL
LINE_TOTAL = QUANTITY * UNIT_PRICE (derived)

Example 2: CUSTOMER_QUERY

CUSTOMER_QUERY = [CUSTOMER_ID | LAST_NAME + FIRST_NAME] + (PHONE_NUMBER)

Interpretation: Query by ID OR by name, with an optional phone number.

4.3 Documenting Data Flow Attributes

For each data flow, also document:

  • Name: Name as it appears on DFD.
  • Composition: Using the structural notation above.
  • Frequency: How often does this flow occur? (e.g., 1000 per day).
  • Volume: Peak number of occurrences (e.g., 200 per hour).
  • Source & Destination: Which processes or entities send/receive it?

5. Data Store Definition

A data store represents a repository of data – typically a file, database table, or collection of records.

5.1 Attributes for Data Store

  • Name & Alias: Name on DFD.
  • Composition: What records does it contain? (e.g., Customer_File = {Customer_Record}).
  • Organization: Sequential, indexed, relational table, etc.
  • Key(s): Primary and secondary keys.
  • Volume: Number of records (current and projected growth).
  • Retention: How long data is kept (e.g., 7 years).
  • Security: Who can read/write? (access control).

5.2 Example Data Store Definition

Data Store: CUSTOMER_FILE

  • Composition: CUSTOMER_FILE = {CUSTOMER_RECORD}
  • Structure: CUSTOMER_ID + NAME + ADDRESS + PHONE + EMAIL + CREDIT_LIMIT
  • Organization: Indexed on CUSTOMER_ID (Primary Key).
  • Volume: 50,000 records; growth 5,000/year.
  • Security: Sales read/write; Accounting read-only.

6. Process Definition

In a data dictionary, process definitions describe the logic that transforms input data flows into output data flows.

6.1 Process Documentation Attributes

  • Process name/number: As shown on DFD (e.g., 2.3 "Calculate Tax").
  • Input/Output flows: List all flows entering and leaving the process.
  • Data stores accessed: Read/write operations.
  • Logic description: Structured English, decision tables, or pseudocode.
  • Performance: Frequency, volume, and required response time.

6.2 Example Process Definition

Process 2.3: Calculate Tax

  • Inputs: SUBTOTAL, TAX_CODE
  • Outputs: TAX_AMOUNT
  • Logic (Structured English):
IF TAX_CODE = "EXEMPT" THEN
    TAX_AMOUNT = 0
ELSE
    Look up RATE from Tax_Rate_Table where TAX_CODE matches
    TAX_AMOUNT = SUBTOTAL * RATE
ENDIF
  • Response time: Required < 0.1 seconds.

7. Relationship Between DFD and Data Dictionary

The DFD and Data Dictionary are inseparable.

Relationship Between DFD and Data Dictionary

DFD ElementData Dictionary Provides
Data flow arrowComposition (list of elements) and attributes.
Data store symbolRecord structure, volume, and organization.
Process bubbleDetailed logic and performance requirements.

Rule: Every data flow, data store, and process shown on a DFD must have an entry in the data dictionary.

8. Creating a Data Dictionary – Step-by-Step

  1. List data flows from DFD: Start from Context and decompose.
  2. Define composition: Use structural notation to break flows into elements.
  3. Define atomic elements: Specify types, lengths, and validation rules.
  4. Define data stores: Specify records, keys, and organization.
  5. Define processes: Specify the transformation logic (pseudocode).
  6. Cross-check: Ensure every element and store access is documented.

9. Data Dictionary Tools

Tool TypeExamplesFeatures
SpreadsheetsExcel, Google SheetsSimple, but lacks automated integrity checking.
DatabasesMS Access, SQLSupports relationships and forms.
CASE ToolsVisible Analyst, ERwinIntegrated DFD and dictionary management.
Professional DocumentationDataedo, ConfluenceHigh-quality reports and collaboration.

10. Key Takeaways

  • The "Glossary": The Data Dictionary is the single source of truth for symbols on a DFD.
  • Atomic Fields: Define smallest units before building complex flows.
  • Consistent Notation: Use +, {}, [], () to avoid ambiguity.
  • Living Document: Must be updated whenever the system analysis changes.
  • Complementary: Works with DFDs (movement) and ERDs (structure).
Hi! Need help with studies? 👋
AI