Topic 2: Organizing Data: Easy Sorting and Structures
With the basics of arrays and functions, we can now explore more powerful ways to organize and process data. Structs let us create custom data types, and sorting allows us to order that data meaningfully.
Sorting Algorithms: From Simple to Efficient
Sorting is the process of arranging elements in a specific order. While Bubble Sort is easy to understand, it's very slow. Let's explore several different methods, analyze their speed, and see why efficient sorting is one of the most important concepts in computer science.
1. Bubble Sort (Simple, $O(n^2)$)
As covered, Bubble Sort repeatedly steps through the list, compares adjacent elements, and swaps them if they are in the wrong order. Smaller (or larger) elements "bubble" up to their correct position.
Simulation (Array: [5, 1, 4, 2])
Pass 1:
[5, 1, 4, 2] -> Swap -> [1, 5, 4, 2]
[1, 5, 4, 2] -> Swap -> [1, 4, 5, 2]
[1, 4, 5, 2] -> Swap -> [1, 4, 2, 5] (5 is sorted)
Pass 2:
[1, 4, 2, 5] -> No Swap
[1, 4, 2, 5] -> Swap -> [1, 2, 4, 5] (4 is sorted)
Pass 3:
[1, 2, 4, 5] -> No Swap (1 is sorted)
Final: [1, 2, 4, 5]
#include <stdio.h>
void swap(int *a, int *b) {
int temp = *a;
*a = *b;
*b = temp;
}
void bubbleSort(int arr[], int n) {
// Outer loop for number of passes
// After i-th pass, the last i elements are sorted
for (int i = 0; i < n - 1; i++) {
int swapped = 0; // Optimization
// Inner loop for comparisons
// We stop at n-1-i because the end is already sorted
for (int j = 0; j < n - 1 - i; j++) {
if (arr[j] > arr[j + 1]) { // Compare adjacent
swap(&arr[j], &arr[j + 1]);
swapped = 1;
}
}
// If no swaps in a pass, array is sorted
if (swapped == 0) {
break;
}
}
}
void printArray(int arr[], int size) {
for (int i = 0; i < size; i++) {
printf("%d ", arr[i]);
}
printf("\n");
}
int main() {
int data[] = {64, 34, 25, 12, 22, 11, 90};
int n = 7;
printf("Unsorted array: ");
printArray(data, n);
bubbleSort(data, n);
printf("Sorted array: ");
printArray(data, n);
return 0;
}
2. Selection Sort (Simple, $O(n^2)$)
Selection Sort works by dividing the array into two parts: a sorted part at the beginning and an unsorted part at the end. The algorithm repeatedly finds the smallest element from the unsorted part and swaps it with the first element of the unsorted part, thus growing the sorted part.
Simulation (Array: [64, 25, 12, 22])
Pass 1: (Find smallest in [64, 25, 12, 22])
- Smallest is 12. Swap with 64.
- Result: [12, 25, 64, 22]
Pass 2: (Find smallest in [25, 64, 22])
- Smallest is 22. Swap with 25.
- Result: [12, 22, 64, 25]
Pass 3: (Find smallest in [64, 25])
- Smallest is 25. Swap with 64.
- Result: [12, 22, 25, 64]
Final: [12, 22, 25, 64]
#include <stdio.h>
// We already have swap() and printArray() from above
void selectionSort(int arr[], int n) {
for (int i = 0; i < n - 1; i++) {
// Find the index of the minimum element in the unsorted part
int min_index = i;
for (int j = i + 1; j < n; j++) {
if (arr[j] < arr[min_index]) {
min_index = j;
}
}
// Swap the found minimum element with the first unsorted element
if (min_index != i) {
swap(&arr[i], &arr[min_index]);
}
}
}
int main_selection() {
int data[] = {64, 25, 12, 22, 11};
int n = 5;
printf("Unsorted: ");
printArray(data, n);
selectionSort(data, n);
printf("Sorted: ");
printArray(data, n); // Output: 11 12 22 25 64
return 0;
}
3. Insertion Sort (Simple, $O(n^2)$)
Insertion Sort also builds the final sorted array one item at a time. It's much like sorting playing cards in your hand. It iterates through the array, and for each element, it "inserts" it into its correct position within the already-sorted part of the array to its left.
Simulation (Array: [12, 11, 13, 5])
- Sorted part: [12]
Take 11: Shift 12 right. Insert 11.
- Sorted part: [11, 12]
Take 13: 13 > 12. No shift. Insert 13.
- Sorted part: [11, 12, 13]
Take 5: 5 < 13, 5 < 12, 5 < 11. Shift 13, 12, 11 right. Insert 5.
- Sorted part: [5, 11, 12, 13]
Final: [5, 11, 12, 13]
#include <stdio.h>
// We already have printArray() from above
void insertionSort(int arr[], int n) {
for (int i = 1; i < n; i++) {
// The element to be inserted
int key = arr[i];
int j = i - 1;
// Move elements of arr[0..i-1] that are greater than
// the key, to one position ahead
while (j >= 0 && arr[j] > key) {
arr[j + 1] = arr[j];
j = j - 1;
}
// Insert the key at its correct position
arr[j + 1] = key;
}
}
int main_insertion() {
int data[] = {12, 11, 13, 5, 6};
int n = 5;
printf("Unsorted: ");
printArray(data, n);
insertionSort(data, n);
printf("Sorted: ");
printArray(data, n); // Output: 5 6 11 12 13
return 0;
}
4. Merge Sort (Efficient, $O(n \log n)$)
Now we move to the fast algorithms. Merge Sort is a "Divide and Conquer" algorithm. It works by:
- Divide: Recursively splitting the array in half until it has many tiny arrays of 1 element (which are, by definition, sorted).
- Conquer: Merging the tiny sorted arrays back together in the correct order, creating larger sorted arrays.
- Combine: Eventually, it merges the final two sorted halves into the fully sorted array.
Its main cost is that it requires extra memory (a "temp" array) to do the merging.
Simulation (Array: [5, 2, 4, 1])
Divide:
- Split: [5, 2] and [4, 1]
- Split: [5], [2] and [4], [1]
Conquer (Merge):
- Merge [5] and [2] -> [2, 5]
- Merge [4] and [1] -> [1, 4]
- Merge [2, 5] and [1, 4] -> [1, 2, 4, 5]
Final: [1, 2, 4, 5]
#include <stdio.h>
#include <stdlib.h> // For malloc/free
// Merge two subarrays L and R into arr
void merge(int arr[], int l, int m, int r) {
int i, j, k;
int n1 = m - l + 1;
int n2 = r - m;
// Create temp arrays
int L[n1], R[n2];
// Copy data to temp arrays L[] and R[]
for (i = 0; i < n1; i++)
L[i] = arr[l + i];
for (j = 0; j < n2; j++)
R[j] = arr[m + 1 + j];
// Merge the temp arrays back into arr[l..r]
i = 0; // Initial index of first subarray
j = 0; // Initial index of second subarray
k = l; // Initial index of merged subarray
while (i < n1 && j < n2) {
if (L[i] <= R[j]) {
arr[k] = L[i];
i++;
} else {
arr[k] = R[j];
j++;
}
k++;
}
// Copy remaining elements of L[]
while (i < n1) { arr[k] = L[i]; i++; k++; }
// Copy remaining elements of R[]
while (j < n2) { arr[k] = R[j]; j++; k++; }
}
// Main mergeSort function
void mergeSort(int arr[], int l, int r) {
if (l < r) {
int m = l + (r - l) / 2; // Avoids overflow
// Sort first and second halves
mergeSort(arr, l, m);
mergeSort(arr, m + 1, r);
// Merge the sorted halves
merge(arr, l, m, r);
}
}
int main_merge() {
int data[] = {12, 11, 13, 5, 6, 7};
int n = 6;
printf("Unsorted: ");
printArray(data, n);
mergeSort(data, 0, n - 1); // Note the 0 and n-1
printf("Sorted: ");
printArray(data, n); // Output: 5 6 7 11 12 13
return 0;
}
5. Quick Sort (Efficient, $O(n \log n)$ average)
Quick Sort is another "Divide and Conquer" algorithm, but it works differently. It picks an element as a "pivot" and partitions the array around the pivot. All elements smaller than the pivot are moved to its left, and all elements larger are moved to its right. This is done recursively for the left and right sub-arrays.
Quick Sort is often faster than Merge Sort in practice, but its "worst-case" performance is $O(n^2)$ if the pivots are chosen poorly (e.g., always picking the smallest or largest element).
Simulation (Array: [7, 2, 1, 8, 6, 5], Pivot=5)
Partition Pass: (i starts at -1)
- j=7 (7 < 5? No)
- j=2 (2 < 5? Yes) -> i=0. Swap arr[0] (7) with arr[1] (2) -> [2, 7, 1, 8, 6, 5]
- j=1 (1 < 5? Yes) -> i=1. Swap arr[1] (7) with arr[2] (1) -> [2, 1, 7, 8, 6, 5]
- j=8 (8 < 5? No)
- j=6 (6 < 5? No)
Final Swap: Swap pivot (5) with arr[i+1] (7)
- Result: [2, 1, 5, 8, 6, 7]
- Now recursively sort [2, 1] and [8, 6, 7]
#include <stdio.h>
// We need a swap function (defined above)
// Partition function
int partition(int arr[], int low, int high) {
int pivot = arr[high]; // pivot
int i = (low - 1); // Index of smaller element
for (int j = low; j <= high - 1; j++) {
// If current element is smaller than the pivot
if (arr[j] < pivot) {
i++; // increment index of smaller element
swap(&arr[i], &arr[j]);
}
}
swap(&arr[i + 1], &arr[high]);
return (i + 1);
}
// Main quickSort function
void quickSort(int arr[], int low, int high) {
if (low < high) {
// pi is partitioning index
int pi = partition(arr, low, high);
// Separately sort elements before partition
quickSort(arr, low, pi - 1);
// and after partition
quickSort(arr, pi + 1, high);
}
}
int main_quick() {
int data[] = {10, 7, 8, 9, 1, 5};
int n = 6;
printf("Unsorted: ");
printArray(data, n);
quickSort(data, 0, n - 1);
printf("Sorted: ");
printArray(data, n); // Output: 1 5 7 8 9 10
return 0;
}
Which Sort is Faster? A Graphical and Mathematical Comparison
When we say "faster," we mean: how does the algorithm's runtime scale as the input size ($n$) gets larger? This is expressed using Big O notation.
Mathematical Comparison
- $O(n^2)$ (Quadratic Time): Bubble, Selection, Insertion.
- $O(n \log n)$ (Log-Linear Time): Merge Sort, Quick Sort (on average).
What does this mean? If you double the input size $n$:
- An $O(n^2)$ algorithm will take roughly $2^2 = 4$ times as long.
- An $O(n \log n)$ algorithm will take slightly more than $2$ times as long.
If $n = 1,000,000$:
- $n^2$ is $1,000,000,000,000$ (a trillion operations - will take minutes or hours)
- $n \log n$ is approx. $20,000,000$ (a few million operations - will take less than a second)
This mathematical difference is the most important concept in algorithmic efficiency.
Graphical Comparison
Imagine plotting the number of items ($n$) on the x-axis and the time taken on the y-axis.
- The $O(n^2)$ line would look like a steep, upward-curving parabola ( 📈 ). It starts slow but becomes unusable very quickly.
- The $O(n \log n)$ line would look like an almost-straight line, rising very slowly ( ↗️ ). It scales extremely well.
Algorithm "Cheat Sheet"
| Algorithm |
Best Case |
Average Case |
Worst Case |
Space |
| Bubble Sort |
$O(n)$ (w/ opt.) |
$O(n^2)$ |
$O(n^2)$ |
$O(1)$ |
| Selection Sort |
$O(n^2)$ |
$O(n^2)$ |
$O(n^2)$ |
$O(1)$ |
| Insertion Sort |
$O(n)$ |
$O(n^2)$ |
$O(n^2)$ |
$O(1)$ |
| Merge Sort |
$O(n \log n)$ |
$O(n \log n)$ |
$O(n \log n)$ |
$O(n)$ |
| Quick Sort |
$O(n \log n)$ |
$O(n \log n)$ |
$O(n^2)$ |
$O(\log n)$ |
*Space: $O(1)$ is "in-place" (no extra memory). $O(n)$ or $O(\log n)$ is the extra memory needed.*
Structs (Structures)
A struct is a user-defined data type that groups together variables of potentially different types. Think of it as creating a custom "template" for your data.
// Define a new type called "struct Student"
struct Student {
int studentID;
char name[100];
float gpa;
};
int main() {
// Create an instance (variable) of this type
struct Student s1;
// Access and modify members using the dot (.) operator
s1.studentID = 1001;
strcpy(s1.name, "Alice Smith"); // Use strcpy for strings!
s1.gpa = 3.8;
// We can also initialize at declaration
struct Student s2 = {1002, "Bob Johnson", 3.5};
printf("Student 1 Name: %s\n", s1.name);
printf("Student 2 ID: %d\n", s2.studentID);
return 0;
}
Arrays of Structs
This is where structs become truly powerful. You can create an array where each element is a complex `struct`. This allows you to store and manage a list of records, like a database table.
#include <stdio.h>
#include <string.h>
struct Student {
int studentID;
char name[100];
float gpa;
};
int main() {
// Create an array that holds 3 Student structs
struct Student classroom[3];
// Populate the first student
classroom[0].studentID = 1001;
strcpy(classroom[0].name, "Alice");
classroom[0].gpa = 3.8;
// Populate the second student
classroom[1].studentID = 1002;
strcpy(classroom[1].name, "Bob");
classroom[1].gpa = 3.5;
// Populate the third student
classroom[2].studentID = 1003;
strcpy(classroom[2].name, "Charlie");
classroom[2].gpa = 3.9;
// Loop through the array and print the data
printf("Class Roster:\n");
for (int i = 0; i < 3; i++) {
printf(" ID: %d, Name: %s, GPA: %.2f\n",
classroom[i].studentID,
classroom[i].name,
classroom[i].gpa);
}
return 0;
}
Sorting Arrays of Integers and Structures
The logic for sorting an array of structs is the same as for integers. The only difference is the comparison step. Instead of comparing `arr[j] > arr[j+1]`, you compare the specific member of the struct you want to sort by (e.g., `arr[j].studentID > arr[j+1].studentID`).
The swap operation must swap the entire struct, not just the member.
Simulation: Sorting an Array of Structs
#include <stdio.h>
#include <string.h>
struct Student {
int studentID;
char name[100];
float gpa;
};
// A swap function that swaps two *entire* Student structs
void swapStudents(struct Student *a, struct Student *b) {
struct Student temp = *a;
*a = *b;
*b = temp;
}
// Bubble sort adapted for an array of Student structs
// Sorts by studentID
void sortStudentsByID(struct Student arr[], int n) {
for (int i = 0; i < n - 1; i++) {
for (int j = 0; j < n - 1 - i; j++) {
// The only change is the comparison logic:
if (arr[j].studentID > arr[j + 1].studentID) {
swapStudents(&arr[j], &arr[j + 1]);
}
}
}
}
void printRoster(struct Student arr[], int n) {
for (int i = 0; i < n; i++) {
printf(" ID: %d, Name: %s, GPA: %.2f\n",
arr[i].studentID, arr[i].name, arr[i].gpa);
}
}
int main() {
struct Student classroom[3] = {
{1003, "Charlie", 3.9},
{1001, "Alice", 3.8},
{1002, "Bob", 3.5}
};
int n = 3;
printf("Unsorted Roster:\n");
printRoster(classroom, n);
sortStudentsByID(classroom, n);
printf("\nSorted Roster (by ID):\n");
printRoster(classroom, n);
return 0;
}
Reading, Storing, and Processing a Stream of Data
You can combine loops, `scanf`, and arrays of structs to read a stream of data from the user until a certain condition is met (e.g., the array is full, or the user enters a sentinel value like -1).
#include <stdio.h>
struct Student {
int studentID;
float gpa;
};
#define MAX_STUDENTS 10
int main() {
struct Student classroom[MAX_STUDENTS];
int student_count = 0;
float gpa_sum = 0;
printf("Enter student ID and GPA (e.g., 1001 3.8)\n");
printf("Enter -1 to stop.\n");
while (student_count < MAX_STUDENTS) {
printf("Student %d: ", student_count + 1);
int id_in;
float gpa_in;
if (scanf("%d", &id_in) != 1) break; // Check input
if (id_in == -1) {
break; // Stop loop
}
if (scanf("%f", &gpa_in) != 1) break; // Check input
classroom[student_count].studentID = id_in;
classroom[student_count].gpa = gpa_in;
gpa_sum += gpa_in;
student_count++;
}
printf("\n--- Data Entry Complete ---\n");
printf("Total students entered: %d\n", student_count);
if (student_count > 0) {
float average_gpa = gpa_sum / student_count;
printf("Average GPA: %.2f\n", average_gpa);
}
return 0;
}
Several Sorting Arrays (Multi-Key Sorting)
This is an advanced topic. What if you want to sort by GPA, but if two students have the same GPA, you want to sort them by name? This is called a multi-key sort. You simply add a secondary check inside your comparison `if` statement.
Simulation: Multi-Key Sort (by GPA, then by Name)
#include <stdio.h>
#include <string.h>
struct Student {
int studentID;
char name[100];
float gpa;
};
void swapStudents(struct Student *a, struct Student *b) {
struct Student temp = *a;
*a = *b;
*b = temp;
}
// Sorts by GPA (descending), then by Name (ascending)
void multiKeySort(struct Student arr[], int n) {
for (int i = 0; i < n - 1; i++) {
for (int j = 0; j < n - 1 - i; j++) {
// Primary Key: GPA (Descending)
if (arr[j].gpa < arr[j + 1].gpa) {
swapStudents(&arr[j], &arr[j + 1]);
}
// Secondary Key: Name (Ascending)
// Check this *only if* the primary keys are equal
else if (arr[j].gpa == arr[j + 1].gpa) {
// strcmp returns > 0 if str1 is "greater" than str2
if (strcmp(arr[j].name, arr[j + 1].name) > 0) {
swapStudents(&arr[j], &arr[j + 1]);
}
}
}
}
}
void printRoster(struct Student arr[], int n) {
for (int i = 0; i < n; i++) {
printf(" GPA: %.2f, Name: %s, ID: %d\n",
arr[i].gpa, arr[i].name, arr[i].studentID);
}
}
int main() {
struct Student roster[4] = {
{1001, "Charlie", 3.8},
{1002, "Alice", 3.9},
{1003, "Bob", 3.8},
{1004, "David", 3.9}
};
int n = 4;
printf("Unsorted Roster:\n");
printRoster(roster, n);
multiKeySort(roster, n);
printf("\nSorted Roster (by GPA desc, then Name asc):\n");
printRoster(roster, n);
return 0;
}