Process, Thread, Coroutine，A Hands-on Experiment to Conquer Concurrency in Python and Golang

引言： 為了真正理解現代程式設計中的 Concurrency（併發）概念，稍微複習了 Process, Thread, Coroutine 之間的關係，並透過程式碼徹底比較 Python 與 Golang 在處理 CPU-bound 和 I/O-bound 任務時的效能差異。結果非常驚人，清晰地揭示了 Python GIL 的限制與 Golang Runtime 的優勢。

# Process, Thread, Coroutine

從資源開銷和獨立性的角度來看，併發執行單元的層次結構由大到小排序如下：

概念	資源開銷 / 獨立性	核心特性	適用場景
External Service / Daemon	系統級最大 (完全獨立)	完全獨立的執行環境，透過網路或 IPC 進行通訊。	微服務架構、背景常駐服務、系統資源隔離。
Process (行程/進程)	最大（獨立記憶體）	OS 調度，可繞過 GIL 實現 Parallelism (並行)。	CPU-Bound 任務
Thread (執行緒/線程)	較小（共享記憶體）	OS 調度，受 Python GIL 限制，無法並行運算。	I/O-Bound 任務 (等待時釋放 GIL)
Coroutine (協程/Goroutine)	最小（單 Thread 內切換）	程式控制，開銷極低，實現 Non-blocking (非阻塞) Concurrency。	高併發 I/O

# CPU-bound vs I/O-bound

特性	CPU-bound (CPU 密集型)	I/O-bound (I/O 密集型)
主要工作	計算、邏輯處理	等待外部資源 (硬碟/網路)
效能瓶頸	CPU 速度、核心數量	I/O 設備速度、網路延遲
CPU 使用率	高 (接近 100%)	低 (CPU 經常閒置)
優化策略	多行程 (Multi-processing)	多執行緒 (Multi-threading) 或非同步 I/O
Python 考量	受到 GIL 限制，需用多行程	GIL 在 I/O 等待時釋放，適合多執行緒/非同步

# Python vs Golang 的 Process, Thread, Coroutine 實驗

本次測試透過兩種典型的工作負載，看看 Python GIL 與 Golang 的高效能併發機制對比

情景 A：CPU-Bound 任務 (cpu_bound_task):
- 邏輯： 執行 $N_{CPU} \times 500,000$ 次數學運算。
- 目的： 使 CPU 核心持續保持 $100\%$ 負載。測試機制能否將工作並行 (Parallel) 分散到多核上。
情景 B：I/O-Bound 任務 (io_bound_task / async_io_bound_task):
- 邏輯： 使用 time.sleep(DELAY_IO) 模擬網路延遲或檔案讀寫等待（ $0.5$ 秒）。
- 目的： 使 CPU 處於空閒等待狀態。測試機制能否在等待時高效地切換到其他任務，避免整體阻塞。

# 2. 測試目的與邏輯：8 倍速的挑戰

設置了 8 個並發任務 ( $N_{WORKERS}=8$ )，並將其以三種不同的方式運行：

測試機制	邏輯目標	測試目的	預期結果
Process	CPU-Bound	驗證並行：能否利用 8 個核心。	總時間 $\approx 0.1\text{s}$ (單核時間 $\div 8$ )
Thread / Asyncio	CPU-Bound	驗證 GIL 限制：是否會因 GIL 而變成串行。	總時間 $\approx 0.4\text{s}$ (所有任務時間相加)
Asyncio / Goroutine	I/O-Bound	驗證切換效率：能否在 $0.5\text{s}$ 內完成 8 個任務。	總時間 $\approx \mathbf{0.5\text{s}}$ (最長任務時間)

# 3. 公平測試

Python multiprocessing.Process: 啟動獨立行程，這是 Python 唯一能繞過 GIL 實現 CPU 並行的方法。
Python asyncio.to_thread: 處理 Asyncio CPU 任務的標準做法，結果將揭示其後台執行緒池仍受 GIL 限制。
Golang Goroutine: Go 語言的核心併發單元，由 Runtime M:N 排程器管理，測試其在兩種情景下的全能性。

通過比較 Process (並行) 與 Thread/Asyncio (串行) 在 CPU 任務上的時間差異，可直接測量 GIL 導致的效能差距。

# Python

import time
import os
from multiprocessing import Process
from threading import Thread
import asyncio

# --- 參數設定 ---
# 測試執行緒/行程/協程的數量
NUM_WORKERS = 8
# CPU 密集型任務的強度 (數值越大，運算時間越久，請根據您的電腦性能調整)
N_CPU = 2
# I/O 密集型任務的模擬等待時間
DELAY_IO = 0.5

# --- 任務定義 ---

# 1. CPU 密集型任務：耗時的計算
def cpu_bound_task(n):
    """佔用 CPU 的運算任務"""
    i = 0
    while i < n * 500000:
        i += 1
        # 進行簡單的數學運算以佔用 CPU
        x = 51000**0.5 
    print(f"[{os.getpid()}] CPU-Bound Task finished.")

# 2. I/O 密集型任務：模擬網路等待
def io_bound_task(delay):
    """模擬網路請求，等待指定時間"""
    time.sleep(delay) 
    print(f"[{os.getpid()}] I/O-Bound Task finished after {delay}s.")

# 3. I/O 密集型任務的協程版本
async def async_io_bound_task(delay):
    """模擬網路請求，使用 await 非阻塞等待"""
    await asyncio.sleep(delay) 
    print(f"[Asyncio] I/O-Bound Task finished after {delay}s.")

# 4. CPU 密集型任務的協程版本 (修復：使用 to_thread 將阻塞任務拋給執行緒池)
async def async_cpu_bound_task(n):
    """將 CPU 運算拋給後台執行緒池，避免阻塞事件循環"""
    # 使用 asyncio.to_thread 將同步函數 cpu_bound_task 在單獨的執行緒中運行
    # 這是 Asyncio 處理 CPU 密集型任務的標準做法。
    await asyncio.to_thread(cpu_bound_task, n) 
    print(f"[Asyncio] CPU-Bound Task finished.")

# --- 運行函數 ---

# A. 行程 (Process) 執行
def run_processes(task, *args):
    start_time = time.time()
    processes = []
    for _ in range(NUM_WORKERS):
        p = Process(target=task, args=args)
        processes.append(p)
        p.start()
    for p in processes:
        p.join()
    end_time = time.time()
    return end_time - start_time

# B. 執行緒 (Thread) 執行
def run_threads(task, *args):
    start_time = time.time()
    threads = []
    for _ in range(NUM_WORKERS):
        t = Thread(target=task, args=args)
        threads.append(t)
        t.start()
    for t in threads:
        t.join()
    end_time = time.time()
    return end_time - start_time


# C. 協程 (Asyncio) 執行
def run_asyncio(task, *args):
    start_time = time.time()
    
    # 1. 創建協程列表 (它們是協程對象)
    tasks = [task(*args) for _ in range(NUM_WORKERS)]
    
    # 2. 定義一個頂層協程，用來等待所有任務完成
    async def main_runner():
        # await 必須在 async 函數內，用於等待 Future 完成
        await asyncio.gather(*tasks)

    # 3. 運行事件循環，傳遞頂層協程 (main_runner() 返回 Coroutine Object)
    asyncio.run(main_runner())
    
    end_time = time.time()
    return end_time - start_time

# --- 主程式執行區 ---
if __name__ == "__main__":
    print(f"--- Python 併發機制測試 (N={NUM_WORKERS}, CPU_N={N_CPU}, IO_D={DELAY_IO}) ---")

    # 1. CPU 密集型任務比較 (應比較 Process vs Thread vs Asyncio)
    print("\n[--- CPU 密集型測試 (計算運算) ---]")
    
    # 【Python 1】Process：**最適用**，應最快，因繞過 GIL
    time_p_cpu = run_processes(cpu_bound_task, N_CPU)
    print(f"Process (CPU) Total Time: {time_p_cpu:.4f}s\n")
    
    # 【Python 2】Thread：**不適用**，應最慢，因受 GIL 限制
    time_t_cpu = run_threads(cpu_bound_task, N_CPU)
    print(f"Thread (CPU) Total Time: {time_t_cpu:.4f}s\n")
    
    # 【Python 3】Asyncio：**不適用**，應與 Thread 接近，因單執行緒阻塞
    time_a_cpu = run_asyncio(async_cpu_bound_task, N_CPU)
    print(f"Asyncio (CPU) Total Time: {time_a_cpu:.4f}s\n")

    # 2. I/O 密集型任務比較 (應比較 Thread vs Asyncio)
    print("\n[--- I/O 密集型測試 (網路等待) ---]")
    
    # 【Python 4】Thread：適用，因 I/O 等待時能釋放 GIL
    time_t_io = run_threads(io_bound_task, DELAY_IO)
    print(f"Thread (I/O) Total Time: {time_t_io:.4f}s\n")
    
    # 【Python 5】Asyncio：**最適用**，應最快，因切換開銷最小
    time_a_io = run_asyncio(async_io_bound_task, DELAY_IO)
    print(f"Asyncio (I/O) Total Time: {time_a_io:.4f}s\n")
    
    # Process 處理 I/O 任務通常開銷較大，一般不使用，故不列入主要比較。

輸出結果：

--- Python 併發機制測試 (N=8, CPU_N=2, IO_D=0.5) ---

[--- CPU 密集型測試 (計算運算) ---]
[41883] CPU-Bound Task finished.
[41884] CPU-Bound Task finished.
[41886] CPU-Bound Task finished.
[41888] CPU-Bound Task finished.
[41885] CPU-Bound Task finished.
[41887] CPU-Bound Task finished.
[41889] CPU-Bound Task finished.
[41890] CPU-Bound Task finished.
Process (CPU) Total Time: 0.1576s

[41879] CPU-Bound Task finished.
[41879] CPU-Bound Task finished.
[41879] CPU-Bound Task finished.
[41879] CPU-Bound Task finished.
[41879] CPU-Bound Task finished.
[41879] CPU-Bound Task finished.
[41879] CPU-Bound Task finished.
[41879] CPU-Bound Task finished.
Thread (CPU) Total Time: 0.4108s

[41879] CPU-Bound Task finished.
[41879] CPU-Bound Task finished.
[41879] CPU-Bound Task finished.
[Asyncio] CPU-Bound Task finished.
[41879] CPU-Bound Task finished.
[Asyncio] CPU-Bound Task finished.
[41879] CPU-Bound Task finished.
[41879] CPU-Bound Task finished.
[41879] CPU-Bound Task finished.
[41879] CPU-Bound Task finished.
[Asyncio] CPU-Bound Task finished.
[Asyncio] CPU-Bound Task finished.
[Asyncio] CPU-Bound Task finished.
[Asyncio] CPU-Bound Task finished.
[Asyncio] CPU-Bound Task finished.
[Asyncio] CPU-Bound Task finished.
Asyncio (CPU) Total Time: 0.4211s


[--- I/O 密集型測試 (網路等待) ---]
[41879] I/O-Bound Task finished after 0.5s.
[41879] I/O-Bound Task finished after 0.5s.
[41879] I/O-Bound Task finished after 0.5s.
[41879] I/O-Bound Task finished after 0.5s.
[41879] I/O-Bound Task finished after 0.5s.
[41879] I/O-Bound Task finished after 0.5s.
[41879] I/O-Bound Task finished after 0.5s.
[41879] I/O-Bound Task finished after 0.5s.
Thread (I/O) Total Time: 0.5069s

[Asyncio] I/O-Bound Task finished after 0.5s.
[Asyncio] I/O-Bound Task finished after 0.5s.
[Asyncio] I/O-Bound Task finished after 0.5s.
[Asyncio] I/O-Bound Task finished after 0.5s.
[Asyncio] I/O-Bound Task finished after 0.5s.
[Asyncio] I/O-Bound Task finished after 0.5s.
[Asyncio] I/O-Bound Task finished after 0.5s.
[Asyncio] I/O-Bound Task finished after 0.5s.
Asyncio (I/O) Total Time: 0.5020s

# Golang

package main

import (
    "fmt"
    "time"
    "runtime"
    "sync" 
)

// --- 參數設定 (應與 Python 保持一致) ---
const NUM_WORKERS = 8
const N_CPU = 2 
const DELAY_IO = 0.5

// --- 任務定義 ---

// 1. CPU 密集型任務 (Goroutine)
func cpuBoundTask(n int, id int, wg *sync.WaitGroup) {
    defer wg.Done()
    i := 0
    for i < n * 500000 {
        i++
        _ = 51000.0 * 51000.0 / 0.5 
    }
    fmt.Printf("[%d] CPU-Bound Goroutine %d finished.\n", id, id)
}

// 2. I/O 密集型任務 (Goroutine)
// 參數 delay_ms 必須是 int 類型，代表毫秒
func ioBoundTask(delay_ms int, id int, wg *sync.WaitGroup) {
    defer wg.Done()
    
    // 傳入的 delay_ms 是毫秒，使用 time.Millisecond 進行休眠
    time.Sleep(time.Duration(delay_ms) * time.Millisecond) 
    
    // 輸出時，將毫秒轉換回秒
    fmt.Printf("[%d] I/O-Bound Goroutine %d finished after %.1fs.\n", id, id, float64(delay_ms)/1000.0)
}

// --- 運行函數 ---
func runGoroutines(task func(int, int, *sync.WaitGroup), task_param int) float64 {
    start := time.Now()
    var wg sync.WaitGroup
    
    for i := 1; i <= NUM_WORKERS; i++ {
        wg.Add(1) 
        go task(task_param, i, &wg) 
    }
    wg.Wait() 
    
    elapsed := time.Since(start)
    return elapsed.Seconds()
}

// --- 主程式執行區 ---
func main() {
    fmt.Printf("--- Golang Goroutine 測試 (CPUs: %d, N=%d) ---\n", runtime.NumCPU(), NUM_WORKERS)
    
    // 1. CPU 密集型任務比較
    fmt.Println("\n[--- CPU 密集型測試 (計算運算) ---]")
    
    // 【Go 1】Goroutine (CPU)：**適用**，應與 Python Process 接近，因能利用多核
    time_g_cpu := runGoroutines(cpuBoundTask, N_CPU)
    fmt.Printf("Goroutine (CPU) Total Time: %.4f seconds.\n", time_g_cpu)

    // 2. I/O 密集型任務比較
    fmt.Println("\n[--- I/O 密集型測試 (網路等待) ---]")
    
    //【Go 2】Goroutine (I/O)：**最適用**，應與 Python Asyncio 接近，因切換極快，Goroutine (I/O)：輕量級併發
    // 修正點：將 DELAY_IO (0.5) 乘以 1000 轉為毫秒 (500)，並強制轉換為 int
    time_g_io := runGoroutines(ioBoundTask, int(DELAY_IO * 1000))
    fmt.Printf("Goroutine (I/O) Total Time: %.4f seconds.\n", time_g_io)
}

輸出結果：

--- Golang Goroutine 測試 (CPUs: 10, N=8) ---

[--- CPU 密集型測試 (計算運算) ---]
[8] CPU-Bound Goroutine 8 finished.
[4] CPU-Bound Goroutine 4 finished.
[2] CPU-Bound Goroutine 2 finished.
[1] CPU-Bound Goroutine 1 finished.
[3] CPU-Bound Goroutine 3 finished.
[6] CPU-Bound Goroutine 6 finished.
[5] CPU-Bound Goroutine 5 finished.
[7] CPU-Bound Goroutine 7 finished.
Goroutine (CPU) Total Time: 0.0008 seconds.

[--- I/O 密集型測試 (網路等待) ---]
[1] I/O-Bound Goroutine 1 finished after 0.5s.
[3] I/O-Bound Goroutine 3 finished after 0.5s.
[8] I/O-Bound Goroutine 8 finished after 0.5s.
[5] I/O-Bound Goroutine 5 finished after 0.5s.
[6] I/O-Bound Goroutine 6 finished after 0.5s.
[2] I/O-Bound Goroutine 2 finished after 0.5s.
[4] I/O-Bound Goroutine 4 finished after 0.5s.
[7] I/O-Bound Goroutine 7 finished after 0.5s.
Goroutine (I/O) Total Time: 0.5024 seconds.

# 為何 Python 輸出多且分散，而 Golang 輸出少且集中？

# 1. Python 輸出多的原因：即時 I/O 與競爭

Python 輸出分散且量多，源於其 I/O 模型的特性：

行緩衝 (Line Buffering): Python 的 print() 預設是行緩衝，輸出一旦遇到換行符 (\n)，會立即送往作業系統。
多單元競爭： 在 Process、Thread 或 Asyncio 環境中，每個獨立執行單元完成工作後，會即時且無序地競爭寫入標準輸出。

總結： Python 的輸出是分散的，因為它會即時顯示每個執行單元的完成狀態。

# 2. Golang 輸出少的原因：Runtime 集中管理

Golang 輸出集中且量少，體現了 Go Runtime 的設計優化：

Runtime I/O 緩衝： Go 語言的 Runtime 會對 Goroutine 的 I/O 進行更積極的集中緩衝和優化。它傾向於收集多個 Goroutine 的輸出，然後透過較少的系統呼叫一次性寫入終端機。
極致的並行效率： 由於 Goroutine 處理 CPU 任務的速度極快（微秒級），即使輸出是分散的，也因時間間隔太短而被終端機視為瞬間完成的批次輸出。

總結： Golang 的輸出是集中的，是 Go Runtime 為了提高效率而進行的 I/O 批次處理。

# 結果

# 1. CPU 密集型效能：Python GIL 的量化成本

機制	總時間 (Total Time)	效能比 (相對於 Goroutine)	結論
Golang Goroutine	$0.0007\text{s} \sim 0.0008\text{s}$	基線 (1.0x)	極致並行： Goroutine 展現了 Go Runtime M:N 排程器的超低延遲和高效能核心利用。
Python Process	$0.1576\text{s}$	$\approx \mathbf{200x}$ 慢	高開銷並行：雖然實現並行，但 Process 啟動和資源隔離的成本導致總耗時遠高於 Goroutine。
Python Thread/Asyncio	$\approx 0.41\text{s}$	$\approx \mathbf{500x}$ 慢	GIL 懲罰：總時間鎖定在單核運算時間，證明 GIL 成功地將 8 核心系統的效能退化為串行。

# 2. I/O 密集型效能：輕量級併發的等價性

機制	總時間 (Total Time)	總結分析
Golang Goroutine	$\approx 0.501\text{s}$	基準：高效且穩定，切換開銷可忽略。
Python Asyncio	$\approx 0.502\text{s}$	極致對標：證明 Python 協程在 I/O 任務上的效率與 Goroutine 處於同一量級。
Python Thread	$\approx 0.506\text{s}$	微小開銷：證實傳統 OS 執行緒的 Context Switching 成本高於用戶級協程。

# 3. 輸出 I/O 模型差異：緩衝與調度策略

Go 和 Python 在終端機輸出上的顯著差異，體現了各自語言對 I/O 策略的選擇：

Golang (集中輸出)： 體現了 Go Runtime 對 I/O 的積極緩衝和批次處理。這種策略優化了系統資源，是高性能伺服器語言的典型特徵。
Python (分散輸出)： 體現了 Python 對 I/O 即時性和 調試可觀察性的偏好（如行緩衝）。這種設計在多進程/多執行緒環境下會導致輸出競爭，是效率較低的策略。

# 結論：程式設計的哲學選擇

本次量化測試提供了選擇語言架構的最終依據：

CPU 密集型：Golang 的 Goroutine 提供了壓倒性的效能。Python 僅限於使用高開銷的 multiprocessing。
I/O 密集型：Python 的 Asyncio 和 Golang 的 Goroutine 均為優秀的解決方案，效率上不分伯仲。選擇將取決於生態系統或對全棧語言的需求。