寫 C/C++多執行緒程式的血與淚: 四個要避免的錯誤

6 min readApr 21, 2019

良久沒有血文章了 XD… 趁著週末來記錄一下這年用 C/C++ 寫 multi-thread 時該注意、避免的事情。

前人已經寫過一些比較常見的了，可以參考 Top 20 C++ multithreading mistakes and how to avoid them，這邊就寫刁鑽一點的囉。

錯誤1: 在 multi-thread 環境下 fork 做事情
錯誤2: 使用 pthread_canncel 去停止 thread
錯誤3: 使用 detach 讓 thread 自生自滅
錯誤4: 多個 thread 同時操作同一個 shared_ptr

錯誤1: 在 multi-thread 環境下 fork 做事情

在 Linux 的世界裡 fork() 用來產生新的 process，可以用於產生 daemon、async 的背景任務等用途。在 multi-thread 的環境下呼叫 fork，只有當前的 thread 會繼續執行，其他的 thread 都會直接消失，但 lock、connditionn variable 和 pthread object 的狀態士會跟著到 child process 的。若在 fork 前剛好有個 thread 咬著 lock，fork 後該 thread 不見，那就會產生一個課本上沒教的 deadlock 方式，沒人放掉。

有一些方式可以勉強避免這種情況，例如

在 fork 前關掉所有 thread，fork 後 parent 再開回來
在 fork 前當前 thread 咬著和控管好全部 lock，fork 後再全部放掉

但實務上都滿難達成的，最實際的，還是直接執行 exec* 系列的 system call，直接變身成新的 process 去繼續做事情，避免狀態不預期。

錯誤2: 使用 pthread_cancel 去停止 thread

終止一個 thread 是一個非常困難的問題（不要懷疑），當我們開 thread 去執行操作後，可能會因為他執行太久想要殺掉他，或是某個 thread 已經達成任務所以想要終止其他 thread。pthread 提供了一個看起來很好用的 function 叫做 pthread_cancel，呼叫下去就可以把一個 thread 殺掉。

pthread_cancel 很好用但也伴隨了很高的風險，目標 thread 在收到 kill 時並不一定會真的死掉，而需要到達某些取消點 (cancellation point) 才會結束，通常是會發生 context switch 的地方，例如一些 read()/write() 的 system call 等才會結束。另外，他也會遇到和錯誤 1 一樣的狀況，部分狀態可能會沒有正確的釋放，例如，某個 thread 咬著 lock 後執行 read/write，此時該 thread 被取消了，lock 就會被咬著，但 thread 就不見了。

幾個可能避免的作法是

使用 pthread_setcancelstate 手動控制啟用/停用 cancellation point 的時間
使用 pthread_cleanup_push 註冊 thread 結束時要釋放的資源

兩個方式都滿不好控制的，這邊會建議手動管理終止方式，例如常見的 two phase termination (以後跟大家分享，請先自行 google)，用一個變數標示 thread 是否要繼續執行，並由該 thread 自行檢查是否終止，main thread 負責做 join 就好了。

錯誤3: 使用 detach 讓 thread 自生自滅

等 thread 結束是一個很麻煩的事情，呼叫 join 還會卡住有點惱人，使用 detach 讓 thread 跑完後自行回收資源不需要呼叫 join 看起來很誘人，但實際上會有一些詭雷很容易就採到了。

首先，當 thread 被標註為 detach 後，基本上程式設計師就不會去管 thread 是否還在執行，而他很可能比我們預期的實行時間還要長..長..長得多，造成有些變數的生命週期已經結束了。例如，有人會在一個 function 裡面用 lambda 捕捉一些變數並開多個 thread 去平行做事情，若此時沒有在該 function 內把全部 thread 都 join 掉，就有可能某些 thread 在 function 已經離開，local variable 已經消失的情況下繼續使用那些變數，那整個程式就會陷入未定義狀態了。

一個更難採到，但實際上遇到比較多次的情況是，以一個大型系統來說，main thread 的最後通常是在關閉和釋放資源，detach 的 thread 若運行時間較長，可能在 main thread 結束後還在跑，造成存取到某些已經被 destruct 掉了的 global variable 或 singleton。若你的程式在結束時很容易 coredump，那就檢查一下有沒有 detach 或忘記 join 了吧。

避免此錯誤的方式只有一個，明確的 join thread!

錯誤4: 多個 thread 同時操作同一個 shared_ptr

這個錯誤其實比較是 shared_ptr 使用錯誤的問題，但大家老是會忘了他，因為複製一個 pointer 沒有 lock 沒什麼影響，但複製一個 shared_ptr 是不 thread-safe 的。若你的程式中可能會有一個 thread 修改 shared_ptr A 的值，另一個 thread 會用 get function 拿到該 shared_ptr，那大概會遇到一些奇怪的 crash 吧。。。

一般來說大家會不小心認為 shared_ptr 是 thread safe，但其實不然，shared_ptr 僅有在計算 reference count 的地方是 thread safe 的，但說到複製或同時操作同一個 shard_ptr 本身（很重要是本身）時並不是 thread safe。原因是 shared_ptr 的實作包含兩個 pointer，指向管理物件的 pointer 和指向 reference counter 的 pointer，對這兩個 pointer 本身的操作沒有 lock。

以上面提到的 shared_ptr A 當例子，若他是最後一個指向 obj1 的 shared_ptr，當他被設成 nullptr 時，reference count 會掉到 0，開始 delete 掉 reference counter 和 delete 掉 obj1。若在上述過程發生時另一個 thread 想要對這個 thread 建立 shared_ptr B，他就可能複製到已經不存在的 reference counter，讓後續程式無法執行。

shared_ptr 保證指向同一個物件的兩個 shared_ptr 是可以被兩個 thread 同時存取的，所以這邊建議避免任何同時存取單一 shared_ptr 的地方，透過上 lock 先複製好 shared_ptr，解掉 lock 再把 shared_ptr 送回給另外一個 thread 就能有效避免這個問題。

如果你的程式老是死在 shared_ptr 的 destructor 的話，看看是不是這個問題吧。

以上四個錯誤使用方式分享給大家，希望未來都不要遇到 multi-thread crash 的情形啦！

錯誤1: 在 multi-thread 環境下 fork 做事情

錯誤2: 使用 pthread_cancel 去停止 thread

錯誤3: 使用 detach 讓 thread 自生自滅

錯誤4: 多個 thread 同時操作同一個 shared_ptr

Written by Jack Yu

No responses yet