Valgrind memcheck usage analysis

最近時常使用valgrind，紀錄一下分析使用心得。

從最簡單的範例開始

以下是一個很明顯的錯誤程式

#include <stdlib.h>
int main()
{
        int *p = malloc(32768);
        return 0;
}

編譯他並用valgrind檢查，可以看到類似的結果。

1 2	$ gcc leak.c -g -o leak $ valgrind --leak-check=full ./leak

下方的3413是PID，而告訴我們在離開之前分配了32768bytes的Memory，而最後Free掉0Bytes。

==3413== HEAP SUMMARY:
==3413==     in use at exit: 32,768 bytes in 1 blocks
==3413==   total heap usage: 1 allocs, 0 frees, 32,768 bytes allocated
==3413== 
==3413== 32,768 bytes in 1 blocks are definitely lost in loss record 1 of 1
==3413==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==3413==    by 0x40053E: main (leak.c:4)
==3413== 
==3413== LEAK SUMMARY:
==3413==    definitely lost: 32,768 bytes in 1 blocks
==3413==    indirectly lost: 0 bytes in 0 blocks
==3413==      possibly lost: 0 bytes in 0 blocks
==3413==    still reachable: 0 bytes in 0 blocks
==3413==         suppressed: 0 bytes in 0 blocks
==3413== 
==3413== For counts of detected and suppressed errors, rerun with: -v
==3413== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

valgrind將Leak分成幾類

definitely lost 絕對是Leak，不用看一定要處理

possibly lost：可能是Leak，跟程式語言特性有關，要仔細分析。
這邊有個possibly lost的範例，valgrind認為可能有問題。

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

int main(int argc, char** argv) {
        char* s = "string";
        // this will allocate a new array
        char* p = strdup(s);
        // move the pointer into the array
        // we know we can reset the pointer by subtracting
        // but for valgrind the array is now lost
        p += 1;
        // deliberately trigger a segfault to crash the program
        *s = 'S';
        // reset the pointer to the beginning of the array
        p -= 1;
        // properly free the memory for the array
        free(p);
        return 0;
}

這邊是輸出結果

==4422== 7 bytes in 1 blocks are possibly lost in loss record 1 of 1
==4422==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==4422==    by 0x4EC0679: strdup (strdup.c:42)
==4422==    by 0x40059F: main (invalid.c:8)
==4422== 
==4422== LEAK SUMMARY:
==4422==    definitely lost: 0 bytes in 0 blocks
==4422==    indirectly lost: 0 bytes in 0 blocks
==4422==      possibly lost: 7 bytes in 1 blocks
==4422==    still reachable: 0 bytes in 0 blocks
==4422==         suppressed: 0 bytes in 0 blocks

在Segment fault之前，p的pointer已經被改動了，valgrind無法確認這會計液體的狀態，只好用possibly lost來描述。

still reachable：這種是在Process結束之前還能夠接觸到的記憶體，既然Process結束之後，所有記憶體會全部回收，這個部份就要分析是否需要特別處理。
以下是個範例

1 2	int *p = malloc(10); exit(0);

結果

==4469== HEAP SUMMARY:
==4469==     in use at exit: 10 bytes in 1 blocks
==4469==   total heap usage: 1 allocs, 0 frees, 10 bytes allocated
==4469== 
==4469== 10 bytes in 1 blocks are still reachable in loss record 1 of 1
==4469==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==4469==    by 0x400595: main (invalid.c:6)
==4469== 
==4469== LEAK SUMMARY:
==4469==    definitely lost: 0 bytes in 0 blocks
==4469==    indirectly lost: 0 bytes in 0 blocks
==4469==      possibly lost: 0 bytes in 0 blocks
==4469==    still reachable: 10 bytes in 1 blocks
==4469==         suppressed: 0 bytes in 0 blocks

常見錯誤

malloc(new/new[]) 和 free(delete/delete[])不匹配

1 2	int *p = new int; delete [] p;

輸出結果

==3558== Mismatched free() / delete / delete []
==3558==    at 0x4C2C83C: operator delete[](void*) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==3558==    by 0x400675: main (mismatch.cpp:5)
==3558==  Address 0x5a1d040 is 0 bytes inside a block of size 4 alloc'd
==3558==    at 0x4C2B0E0: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==3558==    by 0x40065E: main (mismatch.cpp:4)

Address 0x5a1d04就是我們分配出來的記憶體位置，不過雖然不匹配，不過不會造成Memory Leak。

==3558== HEAP SUMMARY:
==3558==     in use at exit: 0 bytes in 0 blocks
==3558==   total heap usage: 1 allocs, 1 frees, 4 bytes allocated
==3558== 
==3558== All heap blocks were freed -- no leaks are possible

不過由於這個範例太簡單了，不是每段程式這樣用都不會出錯，之後有時間來討論這邊為什麼可以過。

Double free

註明ㄜ

Invalid Read / Write

1
2
3

int *p = new int[10];
p[10] = 100;
int v = p[11];

結果

==3687== Invalid write of size 4
==3687==    at 0x4006D1: main (invalid.cpp:7)
==3687==  Address 0x5a1d068 is 0 bytes after a block of size 40 alloc'd
==3687==    at 0x4C2B800: operator new[](unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==3687==    by 0x4006AE: main (invalid.cpp:5)
==3687== 
==3687== Invalid read of size 4
==3687==    at 0x4006DB: main (invalid.cpp:8)
==3687==  Address 0x5a1d06c is 4 bytes after a block of size 40 alloc'd
==3687==    at 0x4C2B800: operator new[](unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==3687==    by 0x4006AE: main (invalid.cpp:5)

我們分配到的記憶體位置是0x5a1d040，而valgrind告訴我們0x5a1d06c的讀取超出邊界，0x5a1d06c - 0x5a1d040 == 2c(hex) = 40 (dec)，依照int是4bytes的大小來看，剛好是int array index為10的部份。
同理可以說明Invalid write的部份，另外可以看到

Address 0x5a1d068 is 0 bytes after a block of size 40 alloc’d
Address 0x5a1d06c is 4 bytes after a block of size 40 alloc’d

表示第一個讀取的記憶體正好靠近在Allocate 40bytes的記憶體邊界，而寫入的部份是距離4bytes遠，跟我們上面分析的結果一樣。

Invlid free

著名的Double free範例

1
2
3

int *p = (int *)malloc(100);
free(p);
free(p);

有了上面那個例子，看了Log比較不會一頭霧水

==3956== Invalid free() / delete / delete[] / realloc()
==3956==    at 0x4C2BDEC: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==3956==    by 0x400600: main (invalid.cpp:9)
==3956==  Address 0x51fd040 is 0 bytes inside a block of size 100 free'd
==3956==    at 0x4C2BDEC: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==3956==    by 0x4005F4: main (invalid.cpp:8)
==3956== HEAP SUMMARY:
==3956==     in use at exit: 0 bytes in 0 blocks
==3956==   total heap usage: 1 allocs, 2 frees, 100 bytes allocated

注意看，HEAP SUMMARY那邊告訴我們，我們只Allocate一次，不過Free兩次。
而這行

Address 0x51fd040 is 0 bytes inside a block of size 100 free’d

告訴我們，0x51fd040是在一塊分配100bytes的記憶體的頭，不過他已經被free掉了。

Syscall param uninitialised

基本上這不算一個Bug，不過會有Security concern。

1 2	char buf[100]; write(2, buf, 100);

結果如下

==3803== Syscall param write(buf) points to uninitialised byte(s)
==3803==    at 0x4F23700: __write_nocancel (syscall-template.S:81)
==3803==    by 0x4005C9: main (uninit.cpp:7)
==3803==  Address 0xffefffdf0 is on thread 1's stack

Conditional jump or move depends on uninitialised value(s)

跟上面那種很像，不過世發生在User space。

int v;
if (v) {
        printf("OK");
} else {
        printf("Bye");
}

1
2
3

==3898== Conditional jump or move depends on uninitialised value(s)
==3898==    at 0x400539: main (uninit.cpp:7)
==3898==

Source and destination overlap

在使用memcpy/strcpy等函數時，src跟dst重疊，這時有可能造成問題。

1
2
3

char src[10];
char dst[20];
memcpy(dst, src, 50);

1
2
3

==4108== Source and destination overlap in memcpy(0xffefffe30, 0xffefffe20, 50)
==4108==    at 0x4C2F71C: memcpy@@GLIBC_2.14 (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==4108==    by 0x400658: main (invalid.cpp:11)

Using gdb and Valgrind together

有時候總有這樣的需求

1	$ valgrind --db-attach=yes program argument(s)

當錯誤發生的時候，會有以下選擇

1	==4222== ---- Attach to debugger ? --- [Return/N/n/Y/y/C/c] ----

按y就能用gdb進去看發生什麼事了。