Overview of Computer Science and Computer Organization
This document provides a foundational overview of computer organization and the principles of programming. We will explore how a computer is structured, how it executes programs, and the fundamental steps involved in translating a human-readable algorithm into machine-executable code using the C programming language.
1. Basic Computer Organization
At a high level, a computer system is designed to take input, process it, and produce output. The foundational model for this is the Von Neumann Architecture, which organizes a computer into three main subsystems connected by a set of wires called a bus.
- Central Processing Unit (CPU): The "brain" of the computer, responsible for executing instructions.
- Main Memory (RAM): A volatile storage area that holds both the program instructions and the data that the program is currently working on.
- Input/Output (I/O) Subsystem: The collection of devices that allow the computer to interact with the outside world.
The System Bus is the communication channel connecting these components. It consists of the Address Bus (carries memory addresses), Data Bus (carries data), and Control Bus (carries control signals).
Inside the CPU
The CPU itself contains several key parts:
- Arithmetic Logic Unit (ALU): Performs all arithmetic (e.g., addition, subtraction) and logical (e.g., AND, OR, NOT) operations.
- Control Unit (CU): Directs the flow of operations. It fetches instructions from memory, decodes them, and tells the other components what to do.
- Registers: Small, extremely fast storage locations within the CPU. Key registers include:
- Program Counter (PC): Holds the memory address of the next instruction to be fetched.
- Instruction Register (IR): Holds the current instruction being executed.
- Memory Address Register (MAR): Holds the address of the memory location to be accessed.
- Memory Data Register (MDR): Temporarily holds data being transferred to or from memory.
2. Program Execution: The Fetch-Decode-Execute Cycle
The CPU executes a program by repeatedly performing a sequence of steps known as the Fetch-Decode-Execute Cycle. Think of it like a chef following a recipe: fetch the next instruction, understand what it means, and then carry it out.
- Fetch Phase:
- The address in the PC is copied to the MAR.
- The Control Unit sends a "read" signal to memory via the control bus.
- The instruction at that memory address is copied to the MDR via the data bus.
- The instruction from the MDR is copied to the IR.
- The PC is incremented to point to the next instruction.
- Decode Phase:
- The Control Unit analyzes the instruction in the IR. It identifies the operation code (opcode) and any operands (data or memory addresses).
- Execute Phase:
- The Control Unit sends signals to the appropriate components (e.g., the ALU) to carry out the instruction. If the instruction requires data from memory, the address is placed in the MAR, and the data is retrieved into the MDR before being processed by the ALU. The result is then stored in a register or written back to memory.
3. Data Storage and Representation
All information in a computer is stored as sequences of binary digits, or bits. A group of 8 bits is a byte. The way this raw binary data is interpreted depends on its context.
Representing Characters: ASCII and Unicode
To represent text, computers use character encoding standards to map characters to numbers.
- ASCII (American Standard Code for Information Interchange): An early standard that uses 7 bits (commonly stored in a byte) to represent 128 characters, including English letters, digits, and punctuation. For example, the character 'A' is represented by the decimal number 65, which is
01000001in binary. - Unicode: A modern, universal standard that aims to represent every character from every language. UTF-8 is the most common Unicode encoding on the web. It is backward-compatible with ASCII but can use multiple bytes to represent a vast range of characters (e.g., '€', 'α', '😊').
4. Problems, Algorithms, and Programming
Computer Science is fundamentally about solving problems. This process involves a clear progression from an abstract idea to a concrete solution.
- Problem: A well-defined task that needs to be accomplished.
- Algorithm: A finite, step-by-step, unambiguous procedure for solving a specific problem.
- Program: The implementation of an algorithm in a specific programming language.
Measuring Efficiency: Big-O, Big-\(\Omega\), and Big-\(\Theta\) Notation
When we evaluate an algorithm's efficiency, we are interested in how its resource usage (typically time) grows as the size of the input grows. We use asymptotic notation to describe these growth rates.
Big-O (O): Asymptotic Upper Bound
Big-O describes the worst-case scenario. A function \(T(n)\) is in \(O(f(n))\) if there exist positive constants \(c\) and \(n_0\) such that for all \(n \ge n_0\), the following condition holds:
\[ 0 \le T(n) \le c \cdot f(n) \]
This means that for a large enough input \(n\), the algorithm's runtime is bounded above by \(c \cdot f(n)\). The constant \(c\) accounts for machine-specific factors, allowing us to focus purely on the growth rate.
Big-Omega (\(\Omega\)): Asymptotic Lower Bound
Big-\(\Omega\) describes the best-case scenario. A function \(T(n)\) is in \(\Omega(g(n))\) if there exist positive constants \(c\) and \(n_0\) such that for all \(n \ge n_0\):
\[ T(n) \ge c \cdot g(n) \ge 0 \]
This provides a lower bound on the algorithm's runtime.
Big-Theta (\(\Theta\)): Asymptotic Tight Bound
Big-\(\Theta\) provides a tight bound, meaning the algorithm's runtime is bounded both above and below. An algorithm is \(\Theta(h(n))\) if it is both \(O(h(n))\) and \(\Omega(h(n))\).
Common Complexity Classes
- \(O(1)\) - Constant Time: The algorithm takes the same amount of time regardless of input size. Ex: Accessing an array element at a known index.
- \(O(\log n)\) - Logarithmic Time: The time taken increases with the logarithm of the input size. These algorithms typically work by repeatedly dividing the problem size. Ex: Binary Search in a sorted array.
- \(O(n)\) - Linear Time: The runtime grows linearly with the input size 'n'. Ex: Searching an unsorted array or finding the maximum value.
- \(O(n^2)\) - Quadratic Time: The runtime grows quadratically. Ex: A simple sorting algorithm like Bubble Sort, or iterating through all pairs in a list.
Mathematical Proof of Complexity: Example
Consider the following code for finding pairs in an array:
void findPairs(int arr[], int n) {
for (int i = 0; i < n; i++) { // Loop 1
for (int j = 0; j < n; j++) { // Loop 2
printf("(%d, %d)\n", arr[i], arr[j]); // Operation
}
}
}
To prove this is \(O(n^2)\), we count the fundamental operations. The `printf` statement is the core operation. Loop 1 runs \(n\) times. For each iteration of Loop 1, Loop 2 also runs \(n\) times. Therefore, the total number of times the `printf` operation is executed is:
\[ T(n) = n \times n = n^2 \]
According to the formal definition of Big-O, we need to find constants \(c\) and \(n_0\) such that \(T(n) \le c \cdot n^2\). If we choose \(c=1\) and \(n_0=1\), the condition \(n^2 \le 1 \cdot n^2\) is always true for \(n \ge 1\). Thus, the function's time complexity is \(O(n^2)\).
Properties of Big-O Notation
- Sum Rule: If an algorithm performs two independent tasks, one taking \(O(f(n))\) time and the other taking \(O(g(n))\) time, the total time complexity is \(O(\max(f(n), g(n)))\). We only care about the term that grows faster.
- Product Rule: If an algorithm performs an operation \(O(g(n))\) times within a loop that runs \(O(f(n))\) times, the total complexity is \(O(f(n) \times g(n))\). This is what we used in the proof above.
5. Program Structure and The Toolchain
When you write a program in a high-level language like C, it must go through several stages before it can be executed by the CPU. This is known as the build or compilation process.
- Source Code: The human-readable
.cfile. - Preprocessor: Processes directives like
#includeand#define. For example, it replaces#include <stdio.h>with the actual contents of the standard I/O header file. - Compiler: Translates the preprocessed code into assembly language. The compiler may also perform optimizations to make the code run faster or use less memory.
- Assembler: Converts the assembly code into machine code (object file).
- Linker: Combines your object file with object files from libraries to create a single executable file. This can be done:
- Statically: The library code is copied directly into your executable file. The file is larger but self-contained.
- Dynamically: The executable contains references to shared libraries on the system. The file is smaller, but requires the libraries to be present at runtime.
6. ANSI C: First Steps to Programming
ANSI C is a standardized version of the C programming language. C provides low-level access to memory while still offering high-level constructs.
Basic C Data Types
| Type | Description | Typical Size |
|---|---|---|
int | Integer numbers | 4 bytes |
float | Single-precision floating-point (decimal) numbers | 4 bytes |
double | Double-precision floating-point numbers | 8 bytes |
char | A single character | 1 byte |
Basic Arithmetic Operators
C supports standard arithmetic operators.
#include <stdio.h>
int main(void) {
int a = 10;
int b = 4;
printf("Sum: %d\n", a + b); // Output: 14
printf("Difference: %d\n", a - b); // Output: 6
printf("Product: %d\n", a * b); // Output: 40
printf("Quotient: %d\n", a / b); // Output: 2 (integer division)
printf("Remainder: %d\n", a % b); // Output: 2 (modulo operator)
return 0;
}