0%

Tools for multi-thread programming for linux

Helgrind 和 DRD

這兩個工具都是Valgrind的一部分,用途也相同,檢查Thread error,不過用的策略不同,可以交替使用檢茶室否有無隱藏的錯誤。
以下是從Binary hacks抄下的範例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
#include <pthread.h>
static int count = 1;
void *incr_count(void *p) {
count++;
return 0;
}
static pthread_mutex_t m1 = PTHREAD_MUTEX_INITIALIZER;
static pthread_mutex_t m2 = PTHREAD_MUTEX_INITIALIZER;
void *lock_m1_then_m2(void *p) {
pthread_mutex_lock(&m1);
pthread_mutex_lock(&m2);
pthread_mutex_unlock(&m2);
pthread_mutex_unlock(&m1);
return 0;
}
void *lock_m2_then_m1(void *p) {
pthread_mutex_lock(&m2);
pthread_mutex_lock(&m1);
pthread_mutex_unlock(&m1);
pthread_mutex_unlock(&m2);
return 0;
}
int main() {
pthread_t t1, t2, t3, t4;
pthread_create(&t1, NULL, incr_count, NULL);
pthread_create(&t2, NULL, incr_count, NULL);
pthread_create(&t3, NULL, lock_m1_then_m2, NULL);
pthread_create(&t4, NULL, lock_m2_then_m1, NULL);
pthread_join(t4, NULL);
pthread_join(t3, NULL);
pthread_join(t2, NULL);
pthread_join(t1, NULL);
return count;
}

裡面有兩個錯誤,一個是count在multi-thread的情況沒有保護,這種情況也可以用下面的thread-sanitizer偵測出來。
另外一種情況就是lock的順序不同,導致Deadlock的情景。
編譯且執行

1
2
$ gcc demo.c -o demo -lpthread
$ valgrind --tool=drd ./demo

輸出太長,列出感興趣的部份

==5172== Possible data race during write of size 4 at 0x600C90 by thread #3
==5172== Locks held: none
==5172== at 0x40065F: incr_count (in /home/hungming/a)
==5172== by 0x4C2DB38: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==5172== by 0x4E3BE99: start_thread (pthread_create.c:308)
==5172==
==5172== This conflicts with a previous write of size 4 by thread #2
==5172== Locks held: none
==5172== at 0x40065F: incr_count (in /home/hungming/a)
==5172== by 0x4C2DB38: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==5172== by 0x4E3BE99: start_thread (pthread_create.c:308)

上面這編列出可能有data-race的情形。

==5172== Thread #5: lock order “0x600CA0 before 0x600CC8” violated
==5172==
==5172== Observed (incorrect) order is: acquisition of lock at 0x600CC8
==5172== at 0x4C2DFCD: pthread_mutex_lock (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==5172== by 0x4006FB: lock_m2_then_m1 (in /home/hungming/a)
==5172== by 0x4C2DB38: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==5172== by 0x4E3BE99: start_thread (pthread_create.c:308)
==5172==
==5172== followed by a later acquisition of lock at 0x600CA0
==5172== at 0x4C2DFCD: pthread_mutex_lock (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==5172== by 0x40070B: lock_m2_then_m1 (in /home/hungming/a)
==5172== by 0x4C2DB38: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==5172== by 0x4E3BE99: start_thread (pthread_create.c:308)

這邊告訴我們lock的順序不對。
更多的使用方法可以參考
Helgrind使用說明
DRD使用說明

thread-sanitizer

thread-sanitizer現在已經是LLVM的一部分,在編譯LLVM的時候就會編譯完成,而GCC 4.8之後也支援thread-sanitizer。
這跟上面的不同是檢查data-race issue。
寫個sample code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#include <pthread.h>
int Global;
void* Thread1(void* x) {
Global++;
return NULL;
}

void* Thread2(void* x) {
Global--;
return NULL;
}

int main() {
pthread_t t[2];
pthread_create(&t[0], NULL, Thread1, NULL);
pthread_create(&t[1], NULL, Thread2, NULL);
pthread_join(t[0], NULL);
pthread_join(t[1], NULL);
return 0;
}

這個範例很簡單,可以看出 Global 在不同Thread下操作可能出現問題。
編譯且執行,注意要加上-fsanitize=thread

1
2
$ clang simple_race.c -fsanitize=thread -g
$ ./a.out

同樣列出我們所關心的部份

WARNING: ThreadSanitizer: data race (pid=4441)
Location is global ‘Global’ of size 4 at 0x7f4d31e90ad8 (a+0x0000016caad8)
SUMMARY: ThreadSanitizer: data race ??:0 Thread2

有了Tool之後,從Log分西問題出在哪就便得很重要了。