3.4 Data Structures Overview¶
What Are Data Structures?¶
A data structure is a way of organizing and storing data so that it can be accessed and modified efficiently. Just like different containers in your kitchen (bowls, drawers, spice racks) are designed for different types of items, different data structures are designed for different types of data and operations.
Think of data structures as the “containers” that hold your program’s information, each with its own strengths and best use cases.
Why Do We Need Different Data Structures?¶
Different problems require different solutions:
-
Shopping list: A simple list works perfectly
-
Phone book: A hash table allows instant lookup by name
-
Family tree: A tree structure shows relationships clearly
-
Undo function: A stack remembers the last action first
Choosing the right data structure can make your program faster, use less memory, and be easier to understand.
Single-Dimensional Arrays (Python Lists)¶
What They Are¶
In Python, we use lists to represent single-dimensional arrays. A list is an ordered collection of items that can be accessed by their position (index).
# Creating a simple list
grades = [85, 92, 78, 96, 88]
names = ["Alice", "Bob", "Charlie", "Diana"]
mixed_data = [1, "hello", 3.14, True]
# Accessing elements by index (starts at 0)
print(grades[0]) # Output: 85 (first element)
print(names[1]) # Output: "Bob" (second element)
print(grades[-1]) # Output: 88 (last element)
Common Operations¶
student_scores = [75, 82, 91]
# Adding elements
student_scores.append(88) # Add to end: [75, 82, 91, 88]
student_scores.insert(1, 79) # Insert at position 1: [75, 79, 82, 91, 88]
# Removing elements
student_scores.remove(82) # Remove first occurrence: [75, 79, 91, 88]
last_score = student_scores.pop() # Remove and return last: 88
# Finding elements
position = student_scores.index(91) # Find position of 91
length = len(student_scores) # Get number of elements
print(f"Score 91 is at position {position}")
print(f"We have {length} scores")
When to Use Lists¶
-
Storing sequences of similar items
-
When you need to maintain order
-
When you need to access items by position
-
For collections that grow and shrink
Multidimensional Arrays (Lists of Lists)¶
2D Arrays - Tables and Grids¶
A 2D array is like a table with rows and columns. In Python, we create them using lists of lists.
# Grade book: 3 students, 4 subjects
# Each row is a student, each column is a subject
grade_book = [
[85, 92, 78, 88], # Alice's grades
[91, 87, 94, 89], # Bob's grades
[76, 83, 81, 85] # Charlie's grades
]
# Accessing specific grades
alice_math = grade_book[0][1] # Alice's math grade (row 0, column 1)
bob_science = grade_book[1][2] # Bob's science grade (row 1, column 2)
print(f"Alice's math grade: {alice_math}")
print(f"Bob's science grade: {bob_science}")
Practical Example: Seating Chart¶
# Movie theater seating (3 rows, 5 seats each)
# True = occupied, False = available
seating_chart = [
[True, False, True, False, False], # Row 1
[False, False, False, True, True], # Row 2
[True, True, False, False, True] # Row 3
]
def find_available_seats():
"""Find all available seats in the theater."""
available = []
for row in range(len(seating_chart)):
for seat in range(len(seating_chart[row])):
if not seating_chart[row][seat]: # If seat is False (available)
available.append(f"Row {row + 1}, Seat {seat + 1}")
return available
print("Available seats:", find_available_seats())
Records (Dictionaries)¶
What They Are¶
A record groups related information together. In Python, we use dictionaries to create records.
# Student record - grouping related information
student = {
"name": "Sarah Johnson",
"age": 17,
"grade_average": 87.5,
"subjects": ["Math", "Science", "English", "History"],
"enrollment_status": True
}
# Accessing record fields
print(f"Student: {student['name']}")
print(f"Age: {student['age']}")
print(f"Average: {student['grade_average']}%")
Multiple Records¶
# Class roster - list of student records
class_roster = [
{"name": "Alice", "age": 16, "grade": 85},
{"name": "Bob", "age": 17, "grade": 92},
{"name": "Charlie", "age": 16, "grade": 78}
]
# Finding students with grades above 80
high_achievers = []
for student in class_roster:
if student["grade"] > 80:
high_achievers.append(student["name"])
print("High achievers:", high_achievers)
Trees (Conceptual Introduction)¶
What They Are¶
A tree is a hierarchical data structure that starts with a root and branches out like a family tree or organization chart.
# Simple family tree using nested dictionaries
family_tree = {
"name": "Grandpa John",
"children": [
{
"name": "Dad Mike",
"children": [
{"name": "Sarah", "children": []},
{"name": "Tom", "children": []}
]
},
{
"name": "Aunt Lisa",
"children": [
{"name": "Emma", "children": []}
]
}
]
}
def print_family(person, level=0):
"""Print family tree with indentation to show levels."""
indent = " " * level
print(f"{indent}{person['name']}")
for child in person['children']:
print_family(child, level + 1)
print("Family Tree:")
print_family(family_tree)
When to Use Trees¶
-
Representing hierarchical relationships (family trees, organization charts)
-
File system structures (folders and subfolders)
-
Decision-making processes
-
Parsing and organizing structured data
Stacks (Last In, First Out)¶
What They Are¶
A stack works like a stack of plates - you can only add or remove from the top. This is called “Last In, First Out” (LIFO).
# Using a Python list as a stack
browser_history = []
# "Push" - add to top of stack
browser_history.append("google.com")
browser_history.append("youtube.com")
browser_history.append("github.com")
print("Current history:", browser_history)
# "Pop" - remove from top of stack
last_page = browser_history.pop()
print(f"Going back from: {last_page}")
print("History after going back:", browser_history)
# Check what's on top without removing
if browser_history:
current_page = browser_history[-1]
print(f"Current page: {current_page}")
Practical Example: Undo Function¶
class TextEditor:
def __init__(self):
self.text = ""
self.undo_stack = []
def type_text(self, new_text):
"""Add text and save current state for undo."""
self.undo_stack.append(self.text) # Save current state
self.text += new_text
def undo(self):
"""Restore previous state."""
if self.undo_stack:
self.text = self.undo_stack.pop()
return True
return False
# Using the text editor
editor = TextEditor()
editor.type_text("Hello ")
editor.type_text("World!")
print(f"Current text: '{editor.text}'")
editor.undo()
print(f"After undo: '{editor.text}'")
Hash Tables (Python Dictionaries)¶
What They Are¶
A hash table allows instant lookup of values using keys. Python dictionaries are hash tables!
# Phone book - instant lookup by name
phone_book = {
"Alice": "555-0101",
"Bob": "555-0102",
"Charlie": "555-0103",
"Diana": "555-0104"
}
# Instant lookup - no need to search through the whole list
alice_number = phone_book["Alice"]
print(f"Alice's number: {alice_number}")
# Adding new entries
phone_book["Emma"] = "555-0105"
# Checking if someone is in the phone book
if "Bob" in phone_book:
print(f"Bob's number is {phone_book['Bob']}")
When to Use Hash Tables¶
-
When you need fast lookup by a unique identifier
-
Counting occurrences of items
-
Caching/memoization
-
Database-like operations
# Counting letter frequencies
def count_letters(text):
"""Count how many times each letter appears."""
letter_counts = {}
for letter in text.lower():
if letter.isalpha(): # Only count letters
if letter in letter_counts:
letter_counts[letter] += 1
else:
letter_counts[letter] = 1
return letter_counts
message = "Hello World"
frequencies = count_letters(message)
print("Letter frequencies:", frequencies)
Sequential Files (CSV)¶
What They Are¶
Sequential files store data in a sequence, like lines in a text file. CSV (Comma-Separated Values) files are a common format for storing tabular data.
import csv
# Writing student data to CSV
students_data = [
["Name", "Age", "Grade"], # Header row
["Alice", 16, 85],
["Bob", 17, 92],
["Charlie", 16, 78]
]
# Write to CSV file
with open("students.csv", "w", newline="") as file:
writer = csv.writer(file)
writer.writerows(students_data)
print("Data written to students.csv")
# Reading from CSV file
with open("students.csv", "r") as file:
reader = csv.reader(file)
for row_number, row in enumerate(reader):
if row_number == 0:
print("Headers:", row)
else:
name, age, grade = row
print(f"Student: {name}, Age: {age}, Grade: {grade}")
Working with CSV Dictionaries¶
import csv
# Writing with headers
fieldnames = ["name", "age", "grade"]
students = [
{"name": "Alice", "age": 16, "grade": 85},
{"name": "Bob", "age": 17, "grade": 92}
]
with open("students_dict.csv", "w", newline="") as file:
writer = csv.DictWriter(file, fieldnames=fieldnames)
writer.writeheader()
writer.writerows(students)
# Reading with headers
with open("students_dict.csv", "r") as file:
reader = csv.DictReader(file)
for student in reader:
print(f"{student['name']} is {student['age']} with grade {student['grade']}")
Choosing the Right Data Structure¶
Quick Reference Guide¶
| Need | Use | Example |
|---|---|---|
| Ordered collection, access by position | List | Shopping list, grades |
| Table/grid of data | 2D List | Spreadsheet, game board |
| Group related information | Dictionary (Record) | Student profile, product info |
| Hierarchical relationships | Tree | Family tree, file system |
| Undo/redo, reverse order | Stack | Browser history, text editor |
| Fast lookup by key | Hash Table (Dict) | Phone book, cache |
| Store/load data files | CSV | Export data, data exchange |
Performance Considerations (For Reference)¶
# Lists are good for:
# - Accessing by index: very fast
# - Adding to end: very fast
# - Searching for value: slow for large lists
# Dictionaries are good for:
# - Looking up by key: very fast
# - Adding new key-value pairs: very fast
# - No ordering (before Python 3.7)
Practice Exercises¶
Exercise 1: Grade Book System
Create a 2D list representing a grade book for 4 students and 3 subjects. Calculate the average grade for each student and each subject.
Sample Solution
# Create a 2D list for 4 students and 3 subjects
# Format: [[student1_grades], [student2_grades], ...]
grade_book = [
[85, 92, 78], # Student 1: Math, Science, English
[91, 88, 95], # Student 2
[76, 84, 89], # Student 3
[93, 79, 87] # Student 4
]
# Calculate average for each student
print("Student Averages:")
for i, grades in enumerate(grade_book):
student_avg = sum(grades) / len(grades)
print(f"Student {i+1}: {student_avg:.1f}")
# Calculate average for each subject
print("\nSubject Averages:")
num_subjects = len(grade_book[0])
for subject in range(num_subjects):
subject_grades = [student[subject] for student in grade_book]
subject_avg = sum(subject_grades) / len(subject_grades)
subject_names = ["Math", "Science", "English"]
print(f"{subject_names[subject]}: {subject_avg:.1f}")
# Find highest and lowest grades
all_grades = [grade for student in grade_book for grade in student]
print(f"\nOverall Statistics:")
print(f"Highest grade: {max(all_grades)}")
print(f"Lowest grade: {min(all_grades)}")
print(f"Class average: {sum(all_grades)/len(all_grades):.1f}")
This demonstrates:
-
2D list creation and access
-
List comprehensions for extracting columns
-
Nested loops for processing 2D data
-
Mathematical operations on collections
-
Formatted output for readability
Exercise 2: Inventory Management
Use a dictionary to track inventory items with their quantities. Implement functions to add items, remove items, and check if an item is in stock.
Sample Solution
def add_item(inventory, item_name, quantity):
"""Add quantity to existing item or create new item."""
if item_name in inventory:
inventory[item_name] += quantity
else:
inventory[item_name] = quantity
print(f"Added {quantity} {item_name}(s). New total: {inventory[item_name]}")
def remove_item(inventory, item_name, quantity):
"""Remove quantity from item if available."""
if item_name not in inventory:
print(f"Error: {item_name} not found in inventory")
return False
if inventory[item_name] < quantity:
print(f"Error: Only {inventory[item_name]} {item_name}(s) available")
return False
inventory[item_name] -= quantity
print(f"Removed {quantity} {item_name}(s). Remaining: {inventory[item_name]}")
return True
def is_in_stock(inventory, item_name, required_quantity=1):
"""Check if item is available in sufficient quantity."""
return item_name in inventory and inventory[item_name] >= required_quantity
def display_inventory(inventory):
"""Display all items and their quantities."""
if not inventory:
print("Inventory is empty")
return
print("Current Inventory:")
for item, quantity in inventory.items():
status = "IN STOCK" if quantity > 0 else "OUT OF STOCK"
print(f" {item}: {quantity} ({status})")
# Example usage
inventory = {}
# Add some items
add_item(inventory, "laptop", 5)
add_item(inventory, "mouse", 10)
add_item(inventory, "keyboard", 3)
# Check stock
print(f"\nMouse in stock (5 needed): {is_in_stock(inventory, 'mouse', 5)}")
print(f"Monitor in stock: {is_in_stock(inventory, 'monitor')}")
# Remove items
remove_item(inventory, "mouse", 3)
remove_item(inventory, "monitor", 1) # Should fail
# Display final inventory
print("\nFinal inventory:")
display_inventory(inventory)
This demonstrates:
-
Dictionary operations (add, access, modify)
-
Function definitions with parameters and return values
-
Conditional logic for validation
-
Error handling for invalid operations
-
Iteration over dictionary items
-
Modular design with separate functions
Exercise 3: Browser History
Implement a simple browser history using a stack. Add functions to visit a new page, go back, and display current history.
Sample Solution
class BrowserHistory:
def __init__(self):
self.history = [] # Stack to store visited pages
self.current_page = None
def visit_page(self, url):
"""Visit a new page, adding it to history."""
if self.current_page is not None:
self.history.append(self.current_page)
self.current_page = url
print(f"Visited: {url}")
def go_back(self):
"""Go back to the previous page."""
if not self.history:
print("Error: No previous page to go back to")
return False
previous_page = self.history.pop()
print(f"Went back to: {previous_page}")
self.current_page = previous_page
return True
def get_current_page(self):
"""Get the current page URL."""
return self.current_page
def display_history(self):
"""Display the browsing history."""
if not self.history and self.current_page is None:
print("No browsing history")
return
print("Browsing History (most recent first):")
if self.current_page:
print(f" CURRENT: {self.current_page}")
# Show history in reverse order (most recent first)
for i, page in enumerate(reversed(self.history), 1):
print(f" {i}: {page}")
def clear_history(self):
"""Clear all browsing history."""
self.history = []
self.current_page = None
print("History cleared")
# Example usage
browser = BrowserHistory()
# Visit some pages
browser.visit_page("https://www.google.com")
browser.visit_page("https://www.github.com")
browser.visit_page("https://www.python.org")
# Display history
browser.display_history()
# Go back
print("\nGoing back...")
browser.go_back()
browser.display_history()
# Go back again
print("\nGoing back again...")
browser.go_back()
browser.display_history()
# Try to go back too far
print("\nTrying to go back when at beginning...")
browser.go_back()
browser.go_back() # Should fail
This demonstrates:
-
Stack implementation using a list
-
Class design with methods and attributes
-
Error handling for invalid operations
-
List operations (append, pop)
-
Iteration and display formatting
-
State management for current position
Exercise 4: Student Database
Create a CSV file with student records and write functions to:
-
Read all students
-
Find students by grade range
-
Add new students
-
Calculate class average
Sample Solution
import csv
import os
def create_sample_data(filename):
"""Create a sample CSV file with student data."""
students = [
["ID", "Name", "Grade"],
["1", "Alice Johnson", "85.5"],
["2", "Bob Smith", "92.0"],
["3", "Carol Davis", "78.3"],
["4", "David Wilson", "88.7"],
["5", "Eva Brown", "91.2"]
]
with open(filename, 'w', newline='') as file:
writer = csv.writer(file)
writer.writerows(students)
print(f"Created sample data file: {filename}")
def read_all_students(filename):
"""Read and return all students from CSV file."""
students = []
try:
with open(filename, 'r') as file:
reader = csv.DictReader(file)
for row in reader:
# Convert grade to float
row['Grade'] = float(row['Grade'])
students.append(row)
except FileNotFoundError:
print(f"Error: File {filename} not found")
return students
def find_students_by_grade_range(filename, min_grade, max_grade):
"""Find students within a grade range."""
students = read_all_students(filename)
matching_students = []
for student in students:
if min_grade <= student['Grade'] <= max_grade:
matching_students.append(student)
return matching_students
def add_student(filename, student_id, name, grade):
"""Add a new student to the CSV file."""
# Check if student ID already exists
students = read_all_students(filename)
existing_ids = [s['ID'] for s in students]
if student_id in existing_ids:
print(f"Error: Student ID {student_id} already exists")
return False
# Add new student
with open(filename, 'a', newline='') as file:
writer = csv.writer(file)
writer.writerow([student_id, name, str(grade)])
print(f"Added student: {name} (ID: {student_id}, Grade: {grade})")
return True
def calculate_class_average(filename):
"""Calculate and return the class average grade."""
students = read_all_students(filename)
if not students:
return 0.0
total_grades = sum(student['Grade'] for student in students)
return total_grades / len(students)
# Example usage
filename = "students.csv"
# Create sample data
create_sample_data(filename)
# Read all students
print("\nAll students:")
students = read_all_students(filename)
for student in students:
print(f" {student['ID']}: {student['Name']} - {student['Grade']}")
# Find students in grade range
print("\nStudents with grades 80-90:")
high_performers = find_students_by_grade_range(filename, 80, 90)
for student in high_performers:
print(f" {student['Name']}: {student['Grade']}")
# Add new student
add_student(filename, "6", "Frank Miller", 87.5)
# Calculate class average
average = calculate_class_average(filename)
print(f"\nClass average: {average:.2f}")
# Clean up
if os.path.exists(filename):
os.remove(filename)
This demonstrates:
-
CSV file operations (read, write, append)
-
Dictionary usage for structured data
-
File error handling and validation
-
List comprehensions and filtering
-
Mathematical operations on data collections
-
Modular function design for data operations
Common Mistakes to Avoid¶
-
Index errors: Remember Python lists start at index 0
-
Modifying lists while iterating: Create a copy if you need to change the list
-
Forgetting CSV file modes: Use “w” for writing, “r” for reading
-
Not handling empty data structures: Check if lists/dictionaries are empty before accessing
Summary¶
Data structures are the foundation of efficient programming:
-
Lists: Perfect for ordered collections and sequences
-
2D Lists: Ideal for tables, grids, and matrices
-
Records (Dictionaries): Group related information together
-
Trees: Represent hierarchical relationships
-
Stacks: Handle last-in-first-out scenarios
-
Hash Tables: Enable fast lookups and mappings
-
CSV Files: Store and exchange tabular data
Understanding when and how to use each data structure will make your programs more efficient and easier to understand.