首页 > *nix技术, 仿真虚拟, 内核技术, 跟踪调试 > 利用KVM调试内核

利用KVM调试内核

2012年5月12日 发表评论 阅读评论 8,517 次浏览

虽然kvm运行的虚拟机也是host的一个进程,但是却不能像UML那样直接gdb attach到对应的进程进行调试,毕竟kvm和uml完全不同,如果那样做的话,你会发现你attach的只是qemu-system-x86进程:

(gdb) bt
#0  0x00007f8dba022ed2 in select () from /lib64/libc.so.6
#1  0x00007f8dbdd2118a in ?? () from /usr/local/bin/qemu-system-x86_64
#2  0x00007f8dbdd1a798 in main () from /usr/local/bin/qemu-system-x86_64
(gdb) 

要用gdb调试kvm虚拟机内核,需要借助qemu-system-x86的两个选项:

-s              shorthand for -gdb tcp::1234
-S              freeze CPU at startup (use 'c' to start execution)

选项-s使得可以通过gdb远程连接qemu进行调试,而-S将让kvm虚拟机停止在执行第一条内核镜像代码的地方,等待gdb连接,如果没有-S选项,那么kvm不等待:

[root@localhost kvm]# qemu-system-x86_64 -hda vdisk.img -net none -m 1024 -daemonize -cpu host -smp 2 -vnc :1 -s

gdb可以通过127.0.0.1:1234或:1234(gdb在本机执行)或192.168.1.1:1234(gdb在另外的机器执行,而kvm host机器ip为192.168.1.1),假设在本host执行gdb命令:

[root@localhost kvm]# gdb
GNU gdb Fedora (6.8-37.el5)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
(gdb) target remote :1234
Remote debugging using :1234
[New Thread 1]
Remote 'g' packet reply is too long: d85f8780ffffffff88f58680ffffffff00000000000000000000000000000000180000000000000020fb7c80ffffffff40318880ffffffff205f8780ffffffff000000000000000063c3dd712e00000072feff00000000004bb52180ffffffffb76ddbb66ddbb66d20748b80ffffffffc09c8b80ffffffff0000000000000000241c2280ffffffff4602000010000000180000001800000018000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000007f03000000000000000000000000000000000000000000000000000000000000000000000000e03f00000000000000007b14ae47e17a843f00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000a01f0000
(gdb) 

如果出现上面这种情况,需要先执行:set architecture i386:x86-64:intel,我的kvm客户机是x86-64:

[root@localhost ~]# uname -a
Linux localhost.localdomain 2.6.30-gentoo-r8 #55 SMP Thu May 10 20:05:44 CST 2012 x86_64 x86_64 x86_64 GNU/Linux

,使得gdb知道远程系统的架构:

(gdb) set architecture i386:x86-64:intel
The target architecture is assumed to be i386:x86-64:intel
(gdb) target remote :1234
Remote debugging using :1234
[New Thread 1]
0xffffffff80221c24 in ?? ()
(gdb) 

加载对应的kvm客户机内核镜像,当然是未压缩的(务必选中内核选项:[*] Compile the kernel with debug info和[*] Compile the kernel with frame pointers):

(gdb) file /tmp/vmlinux
A program is being debugged already.
Are you sure you want to change the file? (y or n) y
Reading symbols from /tmp/vmlinux...done.
(gdb) bt
#0  native_safe_halt () at /usr/src/linux-2.6.37.2/arch/x86/include/asm/irqflags.h:51
#1  0xffffffff80211e41 in default_idle ()
    at /usr/src/linux-2.6.37.2/arch/x86/include/asm/paravirt.h:802
#2  0xffffffff8020ab67 in cpu_idle ()
    at /usr/src/linux-2.6.37.2/arch/x86/kernel/process_64.c:149
#3  0xffffffff8061ab0d in rest_init () at /usr/src/linux-2.6.37.2/init/main.c:474
#4  0xffffffff808adcda in start_kernel () at /usr/src/linux-2.6.37.2/init/main.c:701
#5  0xffffffff808ad2a7 in x86_64_start_reservations (
    real_mode_data=<value optimized out>)
    at /usr/src/linux-2.6.37.2/arch/x86/kernel/head64.c:123
#6  0xffffffff808ad39f in x86_64_start_kernel (
    real_mode_data=0x93050 <Address 0x93050 out of bounds>)
    at /usr/src/linux-2.6.37.2/arch/x86/kernel/head64.c:94
#7  0x0000000000000000 in ?? ()
(gdb) 

加个__schedule断点:

(gdb) c
Continuing.
^C
Program received signal SIGINT, Interrupt.
native_safe_halt () at /usr/src/linux-2.6.37.2/arch/x86/include/asm/irqflags.h:51
51	/usr/src/linux-2.6.37.2/arch/x86/include/asm/irqflags.h: No such file or directory.
	in /usr/src/linux-2.6.37.2/arch/x86/include/asm/irqflags.h
(gdb) b __schedule
Breakpoint 1 at 0xffffffff80636792: file /usr/src/linux-2.6.37.2/kernel/sched.c, line 5022.
(gdb) c
Continuing.
[New Thread 4]
[Switching to Thread 4]

Breakpoint 1, __schedule () at /usr/src/linux-2.6.37.2/kernel/sched.c:5022
5022	/usr/src/linux-2.6.37.2/kernel/sched.c: No such file or directory.
	in /usr/src/linux-2.6.37.2/kernel/sched.c
(gdb) bt
#0  __schedule () at /usr/src/linux-2.6.37.2/kernel/sched.c:5022
#1  0xffffffff80636f51 in schedule () at /usr/src/linux-2.6.37.2/kernel/sched.c:5084
#2  0xffffffff8020ab88 in cpu_idle ()
    at /usr/src/linux-2.6.37.2/arch/x86/kernel/process_64.c:159
#3  0xffffffff80632a4e in start_secondary (unused=<value optimized out>)
    at /usr/src/linux-2.6.37.2/arch/x86/kernel/smpboot.c:329
#4  0x0000000000000000 in ?? ()
(gdb) 

利用命令q退出gdb时,如果导致kvm虚拟机终止,此时需先执行detach命令,后再退出gdb:

(gdb) q
The program is running.  Exit anyway? (y or n) n
Not confirmed.
(gdb) detach 
Ending remote debugging.
(gdb) q
[root@localhost kvm]# 

对于内核调试的一个十分有利帮助是串口的使用,kvm虚拟机的串口可以这样添加:

[root@localhost kvm]# qemu-system-x86_64 -hda vdisk.img -net none -m 1024 -daemonize -cpu host -smp 2 -vnc :1 -s -serial file:serial.log

这将在当前目录下生成一个serial.log的文件,kvm虚拟机的串口输出将重定向到这个文件内,比如给kvm虚拟机的内核加上串口输出选项(console=ttyS0,115200)后,kvm虚拟机的内核信息将输出到这个文件:

[root@localhost kvm]# ls serial.log -lh
-rw-r----- 1 root root 21K May 11 16:56 serial.log
[root@localhost kvm]# 

还可以将kvm虚拟机的串口重定向到一个tcp监听口:

[root@localhost kvm]# qemu-system-x86_64 -hda vdisk.img -net none -m 1024 -daemonize -cpu host -smp 2 -vnc :1 -s -serial tcp::1235,server
QEMU waiting for connection on: tcp:0.0.0.0:1235,server

执行qemu-system-x86_64后进行等待链接状态,在本机可以执行(当然,你需要另开一个终端):

[root@localhost ~]# telnet 127.0.0.1 1235

在另外的机器,那么可执行(前面已交代这里kvm host机器的ip为192.168.1.1):

[root@localhost ~]# telnet 192.168.1.1 1235

之后,kvm虚拟机的串口输出将都打印在telnet上,并且此时可通过这个串口通道登陆kvm虚拟机。
另外,发现一个问题就是通过windows上的VNC Viewer 4远程连接到kvm虚拟机,进入grub后键盘就无响应,任何对内核选项的上下选择、编辑或启动都失效,此时无法做任何操作,只能在host机器内kill qemu-system-x86_64。如果在升级内核,这非常不方便,但值得庆幸的是qemu-system-x86_64支持直接在外部指定内核镜像(具体可以参考qemu-system-x86_64 –help):

[root@localhost kvm]# qemu-system-x86_64 -hda vdisk.img -net none -m 1024 -daemonize -cpu host -smp 2 -vnc :1 -kernel vmlinuz-2.6.18-194.el5 -initrd initrd-2.6.18-194.el5.img
[root@localhost kvm]# 

所以,在装好最初的kvm虚拟机后立马把这两个文件备份到host机器来,这样如果后续捣鼓其它内核出了问题还能通过这种方法进入kvm虚拟机内进行修复(也许还可以利用其它工具,比如http://libguestfs.org/来进行,不过毕竟不是直接手段而比较麻烦)。
kvm虚拟机模块的调试要麻烦一点,首先需要在gdb里主动加载对应模块的符号,并且要加载到正确的位置。模块的代码位置可以在kvm虚拟机使用如下命令查看:

[root@localhost ~]# cat /proc/modules 
igb 84012 0 - Live 0xffffffffa0007000
dca 6468 1 igb, Live 0xffffffffa0000000
[root@localhost ~]# 

只加载了两个模块,以igb模块为例,在host机内的gdb内执行add-symbol-file,其中/tmp/igb.ko是kvm虚拟机的igb模块文件,拷贝到host机器内的,而0xffffffffa0007000是从上面/proc/modules文件内看到的:

(gdb) add-symbol-file /tmp/igb.ko 0xffffffffa0007000
add symbol table from file "/tmp/igb.ko" at
	.text_addr = 0xffffffffa0007000
(y or n) y
Reading symbols from /tmp/igb.ko...done.
(gdb) c
Continuing.

设置一个igb模块内的igb_clean_tx_irq函数断点试试,马上断下来了(因为我这里使用了igb ssh远程连接),看来没什么问题:

(gdb) b igb_clean_tx_irq 
Breakpoint 2 at 0xffffffffa000a5a8
(gdb) c
Continuing.
[New Thread 2]
[Switching to Thread 2]

Breakpoint 2, 0xffffffffa000a5a8 in igb_clean_tx_irq ()
(gdb) bt
#0  0xffffffffa000a5a8 in igb_clean_tx_irq ()
#1  0xffffffffa000c19e in igb_msix_tx ()
#2  0xffffffff8027cb92 in handle_IRQ_event (irq=27, action=0xffff88003e18bf40)
    at /usr/src/linux-2.6.37.2/kernel/irq/handle.c:371
#3  0xffffffff8027e9f0 in handle_edge_irq (irq=27, desc=0xffff88003e6a85c0)
    at /usr/src/linux-2.6.37.2/kernel/irq/chip.c:514
#4  0xffffffff8020de43 in handle_irq (irq=27, regs=<value optimized out>)
    at /usr/src/linux-2.6.37.2/include/linux/irq.h:312
#5  0xffffffff8020d6a1 in do_IRQ (regs=0xffff88003f89de18) at /usr/src/linux-2.6.37.2/arch/x86/kernel/irq.c:215
#6  0xffffffff8020c453 in common_interrupt ()
#7  0xffff88003f89de40 in ?? ()
#8  0x0000000000000000 in ?? ()
(gdb) c
Continuing.

如果不执行对应的add-symbol-file命令,那么将会这样:

(gdb) b igb_clean_tx_irq 
Function "igb_clean_tx_irq" not defined.
Make breakpoint pending on future shared library load? (y or [n]) n

就算选择y,后续也不能捕获到该断点。

转载请保留地址:http://www.lenky.info/archives/2012/05/1624http://lenky.info/?p=1624


备注:如无特殊说明,文章内容均出自Lenky个人的真实理解而并非存心妄自揣测来故意愚人耳目。由于个人水平有限,虽力求内容正确无误,但仍然难免出错,请勿见怪,如果可以则请留言告之,并欢迎来讨论。另外值得说明的是,Lenky的部分文章以及部分内容参考借鉴了网络上各位网友的热心分享,特别是一些带有完全参考的文章,其后附带的链接内容也许更直接、更丰富,而我只是做了一下归纳&转述,在此也一并表示感谢。关于本站的所有技术文章,欢迎转载,但请遵从CC创作共享协议,而一些私人性质较强的心情随笔,建议不要转载。

法律:根据最新颁布的《信息网络传播权保护条例》,如果您认为本文章的任何内容侵犯了您的权利,请以Email或书面等方式告知,本站将及时删除相关内容或链接。

  1. 2014年4月16日10:32 | #1

    博主,你好,我是在网络上看到《深入剖析nginx》的电子书,然后找到贵站的。最近,也是打算研究下nginx源码,边看你的书籍作为辅助。同时,看到了这篇文章。我之前也用qemu来做过调试,在start_kernel里面下断点的时候,发现下部了。后来我利用gdb和bochs搭建了一个调试内核的环境,可以在start_kernel这些地方设置断点,广义来说是任何地方都可以设置断点。希望以后能和您交流学习下。

  1. 2012年11月26日17:25 | #1