Topic 2: Organizing Data: Easy Sorting and Structures

With the basics of arrays and functions, we can now explore more powerful ways to organize and process data. Structs let us create custom data types, and sorting allows us to order that data meaningfully.

Sorting Algorithms: From Simple to Efficient

Sorting is the process of arranging elements in a specific order. While Bubble Sort is easy to understand, it's very slow. Let's explore several different methods, analyze their speed, and see why efficient sorting is one of the most important concepts in computer science.

1. Bubble Sort (Simple, $O(n^2)$)

As covered, Bubble Sort repeatedly steps through the list, compares adjacent elements, and swaps them if they are in the wrong order. Smaller (or larger) elements "bubble" up to their correct position.

Simulation (Array: [5, 1, 4, 2])
Pass 1:
[5, 1, 4, 2] -> Swap -> [1, 5, 4, 2]
[1, 5, 4, 2] -> Swap -> [1, 4, 5, 2]
[1, 4, 5, 2] -> Swap -> [1, 4, 2, 5] (5 is sorted)
Pass 2:
[1, 4, 2, 5] -> No Swap
[1, 4, 2, 5] -> Swap -> [1, 2, 4, 5] (4 is sorted)
Pass 3:
[1, 2, 4, 5] -> No Swap (1 is sorted)
Final: [1, 2, 4, 5]

#include <stdio.h>

void swap(int *a, int *b) {
    int temp = *a;
    *a = *b;
    *b = temp;
}

void bubbleSort(int arr[], int n) {
    // Outer loop for number of passes
    // After i-th pass, the last i elements are sorted
    for (int i = 0; i < n - 1; i++) {
        int swapped = 0; // Optimization
        // Inner loop for comparisons
        // We stop at n-1-i because the end is already sorted
        for (int j = 0; j < n - 1 - i; j++) {
            if (arr[j] > arr[j + 1]) { // Compare adjacent
                swap(&arr[j], &arr[j + 1]);
                swapped = 1;
            }
        }
        // If no swaps in a pass, array is sorted
        if (swapped == 0) {
            break; 
        }
    }
}

void printArray(int arr[], int size) {
    for (int i = 0; i < size; i++) {
        printf("%d ", arr[i]);
    }
    printf("\n");
}

int main() {
    int data[] = {64, 34, 25, 12, 22, 11, 90};
    int n = 7;
    
    printf("Unsorted array: ");
    printArray(data, n);
    
    bubbleSort(data, n);
    
    printf("Sorted array:   ");
    printArray(data, n);
    
    return 0;
}

2. Selection Sort (Simple, $O(n^2)$)

Selection Sort works by dividing the array into two parts: a sorted part at the beginning and an unsorted part at the end. The algorithm repeatedly finds the smallest element from the unsorted part and swaps it with the first element of the unsorted part, thus growing the sorted part.

Simulation (Array: [64, 25, 12, 22])
Pass 1: (Find smallest in [64, 25, 12, 22])
- Smallest is 12. Swap with 64.
- Result: [12, 25, 64, 22]
Pass 2: (Find smallest in [25, 64, 22])
- Smallest is 22. Swap with 25.
- Result: [12, 22, 64, 25]
Pass 3: (Find smallest in [64, 25])
- Smallest is 25. Swap with 64.
- Result: [12, 22, 25, 64]
Final: [12, 22, 25, 64]

#include <stdio.h>

// We already have swap() and printArray() from above

void selectionSort(int arr[], int n) {
    for (int i = 0; i < n - 1; i++) {
        // Find the index of the minimum element in the unsorted part
        int min_index = i;
        for (int j = i + 1; j < n; j++) {
            if (arr[j] < arr[min_index]) {
                min_index = j;
            }
        }
        
        // Swap the found minimum element with the first unsorted element
        if (min_index != i) {
            swap(&arr[i], &arr[min_index]);
        }
    }
}

int main_selection() {
    int data[] = {64, 25, 12, 22, 11};
    int n = 5;
    printf("Unsorted: ");
    printArray(data, n);
    
    selectionSort(data, n);
    
    printf("Sorted:   ");
    printArray(data, n); // Output: 11 12 22 25 64
    return 0;
}

3. Insertion Sort (Simple, $O(n^2)$)

Insertion Sort also builds the final sorted array one item at a time. It's much like sorting playing cards in your hand. It iterates through the array, and for each element, it "inserts" it into its correct position within the already-sorted part of the array to its left.

Simulation (Array: [12, 11, 13, 5])
- Sorted part: [12]
Take 11: Shift 12 right. Insert 11.
- Sorted part: [11, 12]
Take 13: 13 > 12. No shift. Insert 13.
- Sorted part: [11, 12, 13]
Take 5: 5 < 13, 5 < 12, 5 < 11. Shift 13, 12, 11 right. Insert 5.
- Sorted part: [5, 11, 12, 13]
Final: [5, 11, 12, 13]

#include <stdio.h>

// We already have printArray() from above

void insertionSort(int arr[], int n) {
    for (int i = 1; i < n; i++) {
        // The element to be inserted
        int key = arr[i];
        int j = i - 1;
        
        // Move elements of arr[0..i-1] that are greater than
        // the key, to one position ahead
        while (j >= 0 && arr[j] > key) {
            arr[j + 1] = arr[j];
            j = j - 1;
        }
        // Insert the key at its correct position
        arr[j + 1] = key;
    }
}

int main_insertion() {
    int data[] = {12, 11, 13, 5, 6};
    int n = 5;
    printf("Unsorted: ");
    printArray(data, n);
    
    insertionSort(data, n);
    
    printf("Sorted:   ");
    printArray(data, n); // Output: 5 6 11 12 13
    return 0;
}

4. Merge Sort (Efficient, $O(n \log n)$)

Now we move to the fast algorithms. Merge Sort is a "Divide and Conquer" algorithm. It works by:

Divide: Recursively splitting the array in half until it has many tiny arrays of 1 element (which are, by definition, sorted).
Conquer: Merging the tiny sorted arrays back together in the correct order, creating larger sorted arrays.
Combine: Eventually, it merges the final two sorted halves into the fully sorted array.

Its main cost is that it requires extra memory (a "temp" array) to do the merging.

Simulation (Array: [5, 2, 4, 1])
Divide:
- Split: [5, 2] and [4, 1]
- Split: [5], [2] and [4], [1]
Conquer (Merge):
- Merge [5] and [2] -> [2, 5]
- Merge [4] and [1] -> [1, 4]
- Merge [2, 5] and [1, 4] -> [1, 2, 4, 5]
Final: [1, 2, 4, 5]

#include <stdio.h>
#include <stdlib.h> // For malloc/free

// Merge two subarrays L and R into arr
void merge(int arr[], int l, int m, int r) {
    int i, j, k;
    int n1 = m - l + 1;
    int n2 = r - m;

    // Create temp arrays
    int L[n1], R[n2];

    // Copy data to temp arrays L[] and R[]
    for (i = 0; i < n1; i++)
        L[i] = arr[l + i];
    for (j = 0; j < n2; j++)
        R[j] = arr[m + 1 + j];

    // Merge the temp arrays back into arr[l..r]
    i = 0; // Initial index of first subarray
    j = 0; // Initial index of second subarray
    k = l; // Initial index of merged subarray
    while (i < n1 && j < n2) {
        if (L[i] <= R[j]) {
            arr[k] = L[i];
            i++;
        } else {
            arr[k] = R[j];
            j++;
        }
        k++;
    }

    // Copy remaining elements of L[]
    while (i < n1) { arr[k] = L[i]; i++; k++; }
    // Copy remaining elements of R[]
    while (j < n2) { arr[k] = R[j]; j++; k++; }
}

// Main mergeSort function
void mergeSort(int arr[], int l, int r) {
    if (l < r) {
        int m = l + (r - l) / 2; // Avoids overflow
        // Sort first and second halves
        mergeSort(arr, l, m);
        mergeSort(arr, m + 1, r);
        // Merge the sorted halves
        merge(arr, l, m, r);
    }
}

int main_merge() {
    int data[] = {12, 11, 13, 5, 6, 7};
    int n = 6;
    printf("Unsorted: ");
    printArray(data, n);
    
    mergeSort(data, 0, n - 1); // Note the 0 and n-1
    
    printf("Sorted:   ");
    printArray(data, n); // Output: 5 6 7 11 12 13
    return 0;
}

5. Quick Sort (Efficient, $O(n \log n)$ average)

Quick Sort is another "Divide and Conquer" algorithm, but it works differently. It picks an element as a "pivot" and partitions the array around the pivot. All elements smaller than the pivot are moved to its left, and all elements larger are moved to its right. This is done recursively for the left and right sub-arrays.

Quick Sort is often faster than Merge Sort in practice, but its "worst-case" performance is $O(n^2)$ if the pivots are chosen poorly (e.g., always picking the smallest or largest element).

Simulation (Array: [7, 2, 1, 8, 6, 5], Pivot=5)
Partition Pass: (i starts at -1)
- j=7 (7 < 5? No)
- j=2 (2 < 5? Yes) -> i=0. Swap arr[0] (7) with arr[1] (2) -> [2, 7, 1, 8, 6, 5]
- j=1 (1 < 5? Yes) -> i=1. Swap arr[1] (7) with arr[2] (1) -> [2, 1, 7, 8, 6, 5]
- j=8 (8 < 5? No)
- j=6 (6 < 5? No)
Final Swap: Swap pivot (5) with arr[i+1] (7)
- Result: [2, 1, 5, 8, 6, 7]
- Now recursively sort [2, 1] and [8, 6, 7]

#include <stdio.h>

// We need a swap function (defined above)

// Partition function
int partition(int arr[], int low, int high) {
    int pivot = arr[high]; // pivot
    int i = (low - 1); // Index of smaller element

    for (int j = low; j <= high - 1; j++) {
        // If current element is smaller than the pivot
        if (arr[j] < pivot) {
            i++; // increment index of smaller element
            swap(&arr[i], &arr[j]);
        }
    }
    swap(&arr[i + 1], &arr[high]);
    return (i + 1);
}

// Main quickSort function
void quickSort(int arr[], int low, int high) {
    if (low < high) {
        // pi is partitioning index
        int pi = partition(arr, low, high);
        
        // Separately sort elements before partition
        quickSort(arr, low, pi - 1);
        // and after partition
        quickSort(arr, pi + 1, high);
    }
}

int main_quick() {
    int data[] = {10, 7, 8, 9, 1, 5};
    int n = 6;
    printf("Unsorted: ");
    printArray(data, n);
    
    quickSort(data, 0, n - 1);
    
    printf("Sorted:   ");
    printArray(data, n); // Output: 1 5 7 8 9 10
    return 0;
}

Which Sort is Faster? A Graphical and Mathematical Comparison

When we say "faster," we mean: how does the algorithm's runtime scale as the input size ($n$) gets larger? This is expressed using Big O notation.

Mathematical Comparison

$O(n^2)$ (Quadratic Time): Bubble, Selection, Insertion.
$O(n \log n)$ (Log-Linear Time): Merge Sort, Quick Sort (on average).

What does this mean? If you double the input size $n$:

An $O(n^2)$ algorithm will take roughly $2^2 = 4$ times as long.
An $O(n \log n)$ algorithm will take slightly more than $2$ times as long.

If $n = 1,000,000$:

$n^2$ is $1,000,000,000,000$ (a trillion operations - will take minutes or hours)
$n \log n$ is approx. $20,000,000$ (a few million operations - will take less than a second)

This mathematical difference is the most important concept in algorithmic efficiency.

Graphical Comparison

Imagine plotting the number of items ($n$) on the x-axis and the time taken on the y-axis.

The $O(n^2)$ line would look like a steep, upward-curving parabola ( 📈 ). It starts slow but becomes unusable very quickly.
The $O(n \log n)$ line would look like an almost-straight line, rising very slowly ( ↗️ ). It scales extremely well.

Algorithm "Cheat Sheet"

Algorithm	Best Case	Average Case	Worst Case	Space
Bubble Sort	$O(n)$ (w/ opt.)	$O(n^2)$	$O(n^2)$	$O(1)$
Selection Sort	$O(n^2)$	$O(n^2)$	$O(n^2)$	$O(1)$
Insertion Sort	$O(n)$	$O(n^2)$	$O(n^2)$	$O(1)$
Merge Sort	$O(n \log n)$	$O(n \log n)$	$O(n \log n)$	$O(n)$
Quick Sort	$O(n \log n)$	$O(n \log n)$	$O(n^2)$	$O(\log n)$

*Space: $O(1)$ is "in-place" (no extra memory). $O(n)$ or $O(\log n)$ is the extra memory needed.*

Structs (Structures)

A struct is a user-defined data type that groups together variables of potentially different types. Think of it as creating a custom "template" for your data.

// Define a new type called "struct Student"
struct Student {
    int studentID;
    char name[100];
    float gpa;
};

int main() {
    // Create an instance (variable) of this type
    struct Student s1;

    // Access and modify members using the dot (.) operator
    s1.studentID = 1001;
    strcpy(s1.name, "Alice Smith"); // Use strcpy for strings!
    s1.gpa = 3.8;

    // We can also initialize at declaration
    struct Student s2 = {1002, "Bob Johnson", 3.5};

    printf("Student 1 Name: %s\n", s1.name);
    printf("Student 2 ID:   %d\n", s2.studentID);

    return 0;
}

Arrays of Structs

This is where structs become truly powerful. You can create an array where each element is a complex `struct`. This allows you to store and manage a list of records, like a database table.

#include <stdio.h>
#include <string.h>

struct Student {
    int studentID;
    char name[100];
    float gpa;
};

int main() {
    // Create an array that holds 3 Student structs
    struct Student classroom[3];

    // Populate the first student
    classroom[0].studentID = 1001;
    strcpy(classroom[0].name, "Alice");
    classroom[0].gpa = 3.8;

    // Populate the second student
    classroom[1].studentID = 1002;
    strcpy(classroom[1].name, "Bob");
    classroom[1].gpa = 3.5;
    
    // Populate the third student
    classroom[2].studentID = 1003;
    strcpy(classroom[2].name, "Charlie");
    classroom[2].gpa = 3.9;

    // Loop through the array and print the data
    printf("Class Roster:\n");
    for (int i = 0; i < 3; i++) {
        printf("  ID: %d, Name: %s, GPA: %.2f\n", 
               classroom[i].studentID, 
               classroom[i].name, 
               classroom[i].gpa);
    }

    return 0;
}

Sorting Arrays of Integers and Structures

The logic for sorting an array of structs is the same as for integers. The only difference is the comparison step. Instead of comparing `arr[j] > arr[j+1]`, you compare the specific member of the struct you want to sort by (e.g., `arr[j].studentID > arr[j+1].studentID`).

The swap operation must swap the entire struct, not just the member.

Simulation: Sorting an Array of Structs

#include <stdio.h>
#include <string.h>

struct Student {
    int studentID;
    char name[100];
    float gpa;
};

// A swap function that swaps two *entire* Student structs
void swapStudents(struct Student *a, struct Student *b) {
    struct Student temp = *a;
    *a = *b;
    *b = temp;
}

// Bubble sort adapted for an array of Student structs
// Sorts by studentID
void sortStudentsByID(struct Student arr[], int n) {
    for (int i = 0; i < n - 1; i++) {
        for (int j = 0; j < n - 1 - i; j++) {
            // The only change is the comparison logic:
            if (arr[j].studentID > arr[j + 1].studentID) {
                swapStudents(&arr[j], &arr[j + 1]);
            }
        }
    }
}

void printRoster(struct Student arr[], int n) {
    for (int i = 0; i < n; i++) {
        printf("  ID: %d, Name: %s, GPA: %.2f\n", 
               arr[i].studentID, arr[i].name, arr[i].gpa);
    }
}

int main() {
    struct Student classroom[3] = {
        {1003, "Charlie", 3.9},
        {1001, "Alice", 3.8},
        {1002, "Bob", 3.5}
    };
    int n = 3;

    printf("Unsorted Roster:\n");
    printRoster(classroom, n);

    sortStudentsByID(classroom, n);

    printf("\nSorted Roster (by ID):\n");
    printRoster(classroom, n);

    return 0;
}

Reading, Storing, and Processing a Stream of Data

You can combine loops, `scanf`, and arrays of structs to read a stream of data from the user until a certain condition is met (e.g., the array is full, or the user enters a sentinel value like -1).

#include <stdio.h>

struct Student {
    int studentID;
    float gpa;
};
#define MAX_STUDENTS 10

int main() {
    struct Student classroom[MAX_STUDENTS];
    int student_count = 0;
    float gpa_sum = 0;

    printf("Enter student ID and GPA (e.g., 1001 3.8)\n");
    printf("Enter -1 to stop.\n");

    while (student_count < MAX_STUDENTS) {
        printf("Student %d: ", student_count + 1);
        
        int id_in;
        float gpa_in;

        if (scanf("%d", &id_in) != 1) break; // Check input
        
        if (id_in == -1) {
            break; // Stop loop
        }
        
        if (scanf("%f", &gpa_in) != 1) break; // Check input

        classroom[student_count].studentID = id_in;
        classroom[student_count].gpa = gpa_in;
        
        gpa_sum += gpa_in;
        student_count++;
    }

    printf("\n--- Data Entry Complete ---\n");
    printf("Total students entered: %d\n", student_count);

    if (student_count > 0) {
        float average_gpa = gpa_sum / student_count;
        printf("Average GPA: %.2f\n", average_gpa);
    }

    return 0;
}

Several Sorting Arrays (Multi-Key Sorting)

This is an advanced topic. What if you want to sort by GPA, but if two students have the same GPA, you want to sort them by name? This is called a multi-key sort. You simply add a secondary check inside your comparison `if` statement.

Simulation: Multi-Key Sort (by GPA, then by Name)

#include <stdio.h>
#include <string.h>

struct Student {
    int studentID;
    char name[100];
    float gpa;
};

void swapStudents(struct Student *a, struct Student *b) {
    struct Student temp = *a;
    *a = *b;
    *b = temp;
}

// Sorts by GPA (descending), then by Name (ascending)
void multiKeySort(struct Student arr[], int n) {
    for (int i = 0; i < n - 1; i++) {
        for (int j = 0; j < n - 1 - i; j++) {
            
            // Primary Key: GPA (Descending)
            if (arr[j].gpa < arr[j + 1].gpa) {
                swapStudents(&arr[j], &arr[j + 1]);
            } 
            // Secondary Key: Name (Ascending)
            // Check this *only if* the primary keys are equal
            else if (arr[j].gpa == arr[j + 1].gpa) {
                // strcmp returns > 0 if str1 is "greater" than str2
                if (strcmp(arr[j].name, arr[j + 1].name) > 0) {
                    swapStudents(&arr[j], &arr[j + 1]);
                }
            }
        }
    }
}

void printRoster(struct Student arr[], int n) {
     for (int i = 0; i < n; i++) {
        printf("  GPA: %.2f, Name: %s, ID: %d\n", 
               arr[i].gpa, arr[i].name, arr[i].studentID);
    }
}

int main() {
    struct Student roster[4] = {
        {1001, "Charlie", 3.8},
        {1002, "Alice", 3.9},
        {1003, "Bob", 3.8},
        {1004, "David", 3.9}
    };
    int n = 4;

    printf("Unsorted Roster:\n");
    printRoster(roster, n);

    multiKeySort(roster, n);

    printf("\nSorted Roster (by GPA desc, then Name asc):\n");
    printRoster(roster, n);

    return 0;
}

Topic 1: The Core Building Blocks: Arrays, Strings, and Functions

Arrays

Functions

Passing Arrays to Functions

Characters and Strings

Simulation: Manual String Length

Reading and Writing a Sequence of Numbers

Algorithms: Swapping and Min/Max

Swapping a Sequence (or elements)

Finding Min/Max in an Unsorted Sequence

Topic 2: Organizing Data: Easy Sorting and Structures

Sorting Algorithms: From Simple to Efficient

1. Bubble Sort (Simple, $O(n^2)$)

2. Selection Sort (Simple, $O(n^2)$)

3. Insertion Sort (Simple, $O(n^2)$)

4. Merge Sort (Efficient, $O(n \log n)$)

5. Quick Sort (Efficient, $O(n \log n)$ average)

Which Sort is Faster? A Graphical and Mathematical Comparison

Mathematical Comparison

Graphical Comparison

Algorithm "Cheat Sheet"

Structs (Structures)

Arrays of Structs

Sorting Arrays of Integers and Structures

Simulation: Sorting an Array of Structs

Reading, Storing, and Processing a Stream of Data

Several Sorting Arrays (Multi-Key Sorting)

Simulation: Multi-Key Sort (by GPA, then by Name)

Practice Problems

Topic 1: Arrays, Strings & Functions

Easy Problems

Medium Problems

Hard Problems

Topic 2: Sorting & Structs

Easy Problems

Medium Problems

Hard Problems

Problem Solutions

Topic 1 Solutions

Easy 1: Sum of an Array

Easy 2: Reverse a String

Easy 3: Find Element

Medium 1: Manual `strcpy`

Medium 2: Character Frequency

Medium 3: Palindrome Check (Case-Insensitive)

Medium 4: Array Average

Hard 1: Remove Whitespace (In-Place)

Hard 2: Second Largest Element

Hard 3: Matrix Addition

Topic 2 Solutions

Easy 1: Define `struct Book`

Easy 2: Implement Bubble Sort

Easy 3: Find by ID

Medium 1: Read Book Array

Medium 2: Sort Books by Year

Medium 3: Implement Selection Sort

Medium 4: Product Average

Hard 1: Multi-Key Sort

Hard 2: Insertion Sort (for Structs)

Hard 3: Parse CSV String