Overview of Computer Science and Computer Organization

This document provides a foundational overview of computer organization and the principles of programming. We will explore how a computer is structured, how it executes programs, and the fundamental steps involved in translating a human-readable algorithm into machine-executable code using the C programming language.

1. Basic Computer Organization

At a high level, a computer system is designed to take input, process it, and produce output. The foundational model for this is the Von Neumann Architecture, which organizes a computer into three main subsystems connected by a set of wires called a bus.

Central Processing Unit (CPU) Control Unit ALU Registers Main Memory(RAM) Input/Output(Keyboard, Display, etc.) System Bus Address Bus Data Bus Control Bus
  • Central Processing Unit (CPU): The "brain" of the computer, responsible for executing instructions.
  • Main Memory (RAM): A volatile storage area that holds both the program instructions and the data that the program is currently working on.
  • Input/Output (I/O) Subsystem: The collection of devices that allow the computer to interact with the outside world.

The System Bus is the communication channel connecting these components. It consists of the Address Bus (carries memory addresses), Data Bus (carries data), and Control Bus (carries control signals).

Inside the CPU

The CPU itself contains several key parts:

  • Arithmetic Logic Unit (ALU): Performs all arithmetic (e.g., addition, subtraction) and logical (e.g., AND, OR, NOT) operations.
  • Control Unit (CU): Directs the flow of operations. It fetches instructions from memory, decodes them, and tells the other components what to do.
  • Registers: Small, extremely fast storage locations within the CPU. Key registers include:
    • Program Counter (PC): Holds the memory address of the next instruction to be fetched.
    • Instruction Register (IR): Holds the current instruction being executed.
    • Memory Address Register (MAR): Holds the address of the memory location to be accessed.
    • Memory Data Register (MDR): Temporarily holds data being transferred to or from memory.

2. Program Execution: The Fetch-Decode-Execute Cycle

The CPU executes a program by repeatedly performing a sequence of steps known as the Fetch-Decode-Execute Cycle. Think of it like a chef following a recipe: fetch the next instruction, understand what it means, and then carry it out.

FETCHGet instruction DECODEUnderstand instruction EXECUTEPerform action
  1. Fetch Phase:
    • The address in the PC is copied to the MAR.
    • The Control Unit sends a "read" signal to memory via the control bus.
    • The instruction at that memory address is copied to the MDR via the data bus.
    • The instruction from the MDR is copied to the IR.
    • The PC is incremented to point to the next instruction.
  2. Decode Phase:
    • The Control Unit analyzes the instruction in the IR. It identifies the operation code (opcode) and any operands (data or memory addresses).
  3. Execute Phase:
    • The Control Unit sends signals to the appropriate components (e.g., the ALU) to carry out the instruction. If the instruction requires data from memory, the address is placed in the MAR, and the data is retrieved into the MDR before being processed by the ALU. The result is then stored in a register or written back to memory.

3. Data Storage and Representation

All information in a computer is stored as sequences of binary digits, or bits. A group of 8 bits is a byte. The way this raw binary data is interpreted depends on its context.

Representing Characters: ASCII and Unicode

To represent text, computers use character encoding standards to map characters to numbers.

  • ASCII (American Standard Code for Information Interchange): An early standard that uses 7 bits (commonly stored in a byte) to represent 128 characters, including English letters, digits, and punctuation. For example, the character 'A' is represented by the decimal number 65, which is 01000001 in binary.
  • Unicode: A modern, universal standard that aims to represent every character from every language. UTF-8 is the most common Unicode encoding on the web. It is backward-compatible with ASCII but can use multiple bytes to represent a vast range of characters (e.g., '€', 'α', '😊').

4. Problems, Algorithms, and Programming

Computer Science is fundamentally about solving problems. This process involves a clear progression from an abstract idea to a concrete solution.

  • Problem: A well-defined task that needs to be accomplished.
  • Algorithm: A finite, step-by-step, unambiguous procedure for solving a specific problem.
  • Program: The implementation of an algorithm in a specific programming language.

Measuring Efficiency: Big-O, Big-\(\Omega\), and Big-\(\Theta\) Notation

When we evaluate an algorithm's efficiency, we are interested in how its resource usage (typically time) grows as the size of the input grows. We use asymptotic notation to describe these growth rates.

Big-O (O): Asymptotic Upper Bound

Big-O describes the worst-case scenario. A function \(T(n)\) is in \(O(f(n))\) if there exist positive constants \(c\) and \(n_0\) such that for all \(n \ge n_0\), the following condition holds:

\[ 0 \le T(n) \le c \cdot f(n) \]

This means that for a large enough input \(n\), the algorithm's runtime is bounded above by \(c \cdot f(n)\). The constant \(c\) accounts for machine-specific factors, allowing us to focus purely on the growth rate.

Big-Omega (\(\Omega\)): Asymptotic Lower Bound

Big-\(\Omega\) describes the best-case scenario. A function \(T(n)\) is in \(\Omega(g(n))\) if there exist positive constants \(c\) and \(n_0\) such that for all \(n \ge n_0\):

\[ T(n) \ge c \cdot g(n) \ge 0 \]

This provides a lower bound on the algorithm's runtime.

Big-Theta (\(\Theta\)): Asymptotic Tight Bound

Big-\(\Theta\) provides a tight bound, meaning the algorithm's runtime is bounded both above and below. An algorithm is \(\Theta(h(n))\) if it is both \(O(h(n))\) and \(\Omega(h(n))\).

Input Size (n) Time c·f(n) (O) T(n) c·g(n) (Ω) n₀

Common Complexity Classes

Input Size (n) → Time → O(1) O(log n) O(n) O(n²)
  • \(O(1)\) - Constant Time: The algorithm takes the same amount of time regardless of input size. Ex: Accessing an array element at a known index.
  • \(O(\log n)\) - Logarithmic Time: The time taken increases with the logarithm of the input size. These algorithms typically work by repeatedly dividing the problem size. Ex: Binary Search in a sorted array.
  • \(O(n)\) - Linear Time: The runtime grows linearly with the input size 'n'. Ex: Searching an unsorted array or finding the maximum value.
  • \(O(n^2)\) - Quadratic Time: The runtime grows quadratically. Ex: A simple sorting algorithm like Bubble Sort, or iterating through all pairs in a list.

Mathematical Proof of Complexity: Example

Consider the following code for finding pairs in an array:

void findPairs(int arr[], int n) {
    for (int i = 0; i < n; i++) {       // Loop 1
        for (int j = 0; j < n; j++) {   // Loop 2
            printf("(%d, %d)\n", arr[i], arr[j]); // Operation
        }
    }
}

To prove this is \(O(n^2)\), we count the fundamental operations. The `printf` statement is the core operation. Loop 1 runs \(n\) times. For each iteration of Loop 1, Loop 2 also runs \(n\) times. Therefore, the total number of times the `printf` operation is executed is:

\[ T(n) = n \times n = n^2 \]

According to the formal definition of Big-O, we need to find constants \(c\) and \(n_0\) such that \(T(n) \le c \cdot n^2\). If we choose \(c=1\) and \(n_0=1\), the condition \(n^2 \le 1 \cdot n^2\) is always true for \(n \ge 1\). Thus, the function's time complexity is \(O(n^2)\).

Properties of Big-O Notation

  • Sum Rule: If an algorithm performs two independent tasks, one taking \(O(f(n))\) time and the other taking \(O(g(n))\) time, the total time complexity is \(O(\max(f(n), g(n)))\). We only care about the term that grows faster.
  • Product Rule: If an algorithm performs an operation \(O(g(n))\) times within a loop that runs \(O(f(n))\) times, the total complexity is \(O(f(n) \times g(n))\). This is what we used in the proof above.

5. Program Structure and The Toolchain

When you write a program in a high-level language like C, it must go through several stages before it can be executed by the CPU. This is known as the build or compilation process.

Source Codeprogram.c Preprocessorprogram.i Compilerprogram.s Assemblerprogram.o Linker Executablea.out / program.exe
  1. Source Code: The human-readable .c file.
  2. Preprocessor: Processes directives like #include and #define. For example, it replaces #include <stdio.h> with the actual contents of the standard I/O header file.
  3. Compiler: Translates the preprocessed code into assembly language. The compiler may also perform optimizations to make the code run faster or use less memory.
  4. Assembler: Converts the assembly code into machine code (object file).
  5. Linker: Combines your object file with object files from libraries to create a single executable file. This can be done:
    • Statically: The library code is copied directly into your executable file. The file is larger but self-contained.
    • Dynamically: The executable contains references to shared libraries on the system. The file is smaller, but requires the libraries to be present at runtime.

6. ANSI C: First Steps to Programming

ANSI C is a standardized version of the C programming language. C provides low-level access to memory while still offering high-level constructs.

Basic C Data Types

TypeDescriptionTypical Size
intInteger numbers4 bytes
floatSingle-precision floating-point (decimal) numbers4 bytes
doubleDouble-precision floating-point numbers8 bytes
charA single character1 byte

Basic Arithmetic Operators

C supports standard arithmetic operators.

#include <stdio.h>

int main(void) {
    int a = 10;
    int b = 4;

    printf("Sum: %d\n", a + b);        // Output: 14
    printf("Difference: %d\n", a - b);  // Output: 6
    printf("Product: %d\n", a * b);    // Output: 40
    printf("Quotient: %d\n", a / b);    // Output: 2 (integer division)
    printf("Remainder: %d\n", a % b);   // Output: 2 (modulo operator)

    return 0;
}

Practice Questions

1.

What are the three main components of the Von Neumann architecture, and what is the primary function of each?

2.

Explain the roles of the MAR and MDR during a memory read operation.

3.

Write a complete ANSI C program that asks the user for their name (a single word) and their age, and then prints a message like "Hello [Name], you are [Age] years old."

4.

What is the difference between Big-O and Big-Omega (\(\Omega\)) notation?

5.

What is the difference between static and dynamic linking?

6.

What would be the output of the following C code snippet?
int x = 10; printf("Value is %d\n", x); x = x + 5; printf("New value is %d\n", x);

7.

Why is the ASCII representation for the character 'C' (decimal 67) different from its integer value 67?

8.

What is the purpose of the #include <stdio.h> directive in a C program?

9.

Write an ANSI C program that prompts the user to enter the length and width of a rectangle (as floating-point numbers) and then calculates and prints its area.

10.

Differentiate between the Arithmetic Logic Unit (ALU) and the Control Unit (CU) within a CPU.

11.

An algorithm to sort a list has a best-case time complexity of \(\Omega(n \log n)\) and a worst-case of \(O(n^2)\). Can we describe its complexity using \(\Theta\) notation? Why or why not?

12.

What does the & symbol do in the statement scanf("%d", &age);?

13.

Briefly describe the roles of the compiler and the assembler.

14.

Write a C program that takes two integers as input from the user, calculates their sum and product, and prints both results.

15.

Identify and correct the error in the following C code:
#include <stdio.h>
int main(void) {
int number = 5;
printf("The number is %f\n", number);
return 0;
}

16.

Using the Sum and Product rules, determine the Big-O time complexity of the following C code snippet and explain your reasoning.
for (int i = 0; i < n; i++) {
printf("%d\n", i);
}
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
printf("%d, %d\n", i, j);
}
}

Solutions & Explanations

1. Von Neumann Architecture Components

The three main components are the CPU (Central Processing Unit), Main Memory (RAM), and the I/O Subsystem.

  • CPU: Executes program instructions.
  • Main Memory: Stores the instructions and data for currently running programs.
  • I/O Subsystem: Handles communication with external devices like the keyboard, mouse, and monitor.

2. MAR and MDR in a Memory Read

During a memory read:

  1. The CPU writes the memory address of the data it needs into the MAR (Memory Address Register).
  2. The CPU sends a read signal to the memory controller.
  3. The memory system retrieves the data from the specified address and places it on the data bus.
  4. The CPU reads this data from the bus into the MDR (Memory Data Register), from where it can be used by other parts of the CPU.

3. C Program: User Name and Age

#include <stdio.h>

int main(void) {
    char name[100]; // Declare a character array to store the name string
    int age;          // Declare an integer variable for the age

    // Prompt for and read the name
    printf("Please enter your first name: ");
    scanf("%s", name);

    // Prompt for and read the age
    printf("Please enter your age: ");
    scanf("%d", &age);

    // Print the final message
    printf("Hello %s, you are %d years old.\n", name, age);

    return 0;
}

4. Big-O vs. Big-Omega

Big-O (O) notation describes an asymptotic upper bound on the growth rate of a function, representing the worst-case time complexity. Big-Omega (\(\Omega\)) notation describes an asymptotic lower bound, representing the best-case time complexity. An algorithm can run no faster than its \(\Omega\) bound and no slower than its O bound for large inputs.

5. Static vs. Dynamic Linking

Static Linking copies all necessary library code directly into the final executable file. This creates a larger, self-contained file. Dynamic Linking places references (or stubs) to shared system libraries in the executable. The actual linking happens at runtime when the program is loaded. This results in smaller executable files and allows libraries to be updated centrally.

6. Code Snippet Output

The code declares an integer `x` with value 10, prints it, adds 5 to it, and prints it again. The output will be:

Value is 10
New value is 15

7. Character vs. Integer Representation

The binary pattern 01000011 can be interpreted in different ways based on context. In the ASCII standard, this pattern is the code for the character 'C'. If a program interprets that same pattern as a pure binary number, its decimal value is 67. The computer doesn't know the difference; the program's logic (e.g., using %c vs. %d in printf) determines how the bits are interpreted and displayed.

8. #include <stdio.h>

This is a preprocessor directive. It tells the C preprocessor to include the contents of the Standard Input/Output Header file (`stdio.h`). This file contains declarations for standard functions like `printf()` and `scanf()`, allowing your program to use them without errors.

9. C Program: Rectangle Area (Float)

#include <stdio.h>

int main(void) {
    float length, width, area; // Declare float variables

    printf("Enter the length of the rectangle: ");
    scanf("%f", &length);

    printf("Enter the width of the rectangle: ");
    scanf("%f", &width);

    area = length * width;

    printf("The area of the rectangle is: %f\n", area);

    return 0;
}

10. ALU vs. CU

The ALU (Arithmetic Logic Unit) is the part of the CPU that performs calculations and logical comparisons. The CU (Control Unit) is the part that directs the operations; it fetches and decodes instructions and sends control signals to the other components. The CU is the manager, while the ALU is the calculator.

11. Using Theta Notation

No, we cannot describe its complexity using \(\Theta\) notation. \(\Theta\) notation is used only when the best-case (\(\Omega\)) and worst-case (O) complexities are the same. Since the lower bound \(\Omega(n \log n)\) is different from the upper bound \(O(n^2)\), a tight bound cannot be established.

12. The `&` Symbol in scanf

The `&` is the "address-of" operator in C. The `scanf` function needs to know the memory address of the variable where it should store the input data. So, &age provides the memory location of the `age` variable, allowing `scanf` to place the integer typed by the user directly into that location.

13. Compiler and Assembler Roles

The Compiler translates high-level source code (like C) into architecture-specific assembly language. The Assembler then takes that assembly language code and translates it into machine code (binary), which is the only language the CPU can directly execute.

14. C Program: Sum and Product

#include <stdio.h>

int main(void) {
    int num1, num2, sum, product;

    printf("Enter the first integer: ");
    scanf("%d", &num1);

    printf("Enter the second integer: ");
    scanf("%d", &num2);

    sum = num1 + num2;
    product = num1 * num2;

    printf("Sum: %d\n", sum);
    printf("Product: %d\n", product);

    return 0;
}

15. C Code Error Correction

The error is in the `printf` statement. The variable `number` is an `int` (integer), but the format specifier used is `%f`, which is for a `float`. This mismatch leads to incorrect output. The correct format specifier for an `int` is `%d`.

Corrected Code:

#include <stdio.h>
int main(void) {
    int number = 5;
    // Correct format specifier %d for an integer
    printf("The number is %d\n", number);
    return 0;
}

16. Big-O Analysis of Code Snippet

We analyze the two parts of the code separately.

Part 1: The first loop.

for (int i = 0; i < n; i++) {
    printf("%d\n", i);
}

This loop runs 'n' times. Therefore, its time complexity is \(O(n)\).

Part 2: The nested loops.

for (int i = 0; i < n; i++) {
    for (int j = 0; j < n; j++) {
        printf("%d, %d\n", i, j);
    }
}

This is a loop within a loop. The outer loop runs 'n' times, and the inner loop also runs 'n' times for each outer iteration. By the Product Rule, the complexity is \(O(n \times n) = O(n^2)\).

Total Complexity:

The two parts run sequentially. Using the Sum Rule, the total complexity is the maximum of the complexities of the individual parts: \(O(n) + O(n^2) = O(\max(n, n^2))\). Since \(n^2\) grows much faster than \(n\), the dominant term is \(n^2\).

The final time complexity is \(O(n^2)\).