Process Creation, #4:sys_fork《核心實作》

January 11, 2007

接續前一記日記的觀念:「Linux 以 sys_forksys_clone 來產生新的 process,而這二個 system call 最後都會呼叫到 do_fork() 函數,do_fork() 是 Linux 主要的 fork-routine。」

將此觀念以實作角度來說明的話,do_fork() 必須要做的工作便是「copy 原來的 process 成為另一個新的 process」。Linux 的 sys_fork() 內部實作就是以「copy process」的方式來實作。也就是說,當 user program 呼叫 fork() wrapper function 後,sys_fork() 便會「copy」原來的 process,以得到一個新的 process。

由此觀念的推導,我們便能了解到,sys_fork() 的內部實作關鍵便是:

1. 如何 copy process。

2. 要 copy process 的「哪個部份」?

這二個關鍵,都是相當值得玩味的題目,同時,透過探討「copy process」的核心實作,我們也可以強化「process address space」的觀念。以下先將 sys_fork() 的內部流程先大略 trace 一遍後,再討論「copy process」的主題;而「copy what」則會在「clone()」的專欄裡再做介紹。

首先,sys_fork()sys_clone() 都呼叫到 do_fork routine。以下是 do_fork() 的原始碼:

 *  Ok, this is the main fork-routine.
 * It copies the process, and if successful kick-starts
 * it and waits for it to finish using the VM if required.
long do_fork(unsigned long clone_flags,
	      unsigned long stack_start,
	      struct pt_regs *regs,
	      unsigned long stack_size,
	      int __user *parent_tidptr,
	      int __user *child_tidptr)
	struct task_struct *p; // 請見 1.
	int trace = 0;
	struct pid *pid = alloc_pid(); // 請見 2.
	long nr;

	if (!pid)
		return -EAGAIN;
	nr = pid->nr;
	if (unlikely(current->ptrace)) {
		trace = fork_traceflag (clone_flags);
		if (trace)
			clone_flags |= CLONE_PTRACE;
	// 請見 3.
	p = copy_process(clone_flags, stack_start, regs, stack_size, parent_tidptr, child_tidptr, nr);
	 * Do this prior waking up the new thread - the thread pointer
	 * might get invalid after that point, if the thread exits quickly.
	if (!IS_ERR(p)) {
		struct completion vfork;

		if (clone_flags & CLONE_VFORK) {
			p->vfork_done = &vfork;

		if ((p->ptrace & PT_PTRACED) || (clone_flags & CLONE_STOPPED)) {
			 * We'll start up with an immediate SIGSTOP.
			sigaddset(&p->pending.signal, SIGSTOP);
			set_tsk_thread_flag(p, TIF_SIGPENDING);

		if (!(clone_flags & CLONE_STOPPED))
			wake_up_new_task(p, clone_flags);
			p->state = TASK_STOPPED;

		if (unlikely (trace)) {
			current->ptrace_message = nr;
			ptrace_notify ((trace << 8) | SIGTRAP);

		if (clone_flags & CLONE_VFORK) {
			if (unlikely (current->ptrace & PT_TRACE_VFORK_DONE))
				ptrace_notify ((PTRACE_EVENT_VFORK_DONE << 8) | SIGTRAP);
	} else {
		nr = PTR_ERR(p);
	return nr;


1. 宣告一個 process descriptor

2. 要求一個 PID 給新的 process 使用。

3. 呼叫 copy_process(),以複制出新的 process。

由此可知,Linux kernel 的 copy_process() API 是重要的「process creation」API。

接著,把 copy_process() 的原始碼 trace 出來:

 * This creates a new process as a copy of the old one,
 * but does not actually start it yet.
 * It copies the registers, and all the appropriate
 * parts of the process environment (as per the clone
 * flags). The actual kick-off is left to the caller.
static task_t *copy_process(unsigned long clone_flags,
				 unsigned long stack_start,
				 struct pt_regs *regs,
				 unsigned long stack_size,
				 int __user *parent_tidptr,
				 int __user *child_tidptr,
				 int pid)

copy_process() 程式碼有點多,這裡只先列出其函數原型。

看到 copy_process() 的第一個參數 clone_flags,這個參數一開始是由 sys_fork() 或是 sys_clone() 所傳遞進來的,並且 copy_process() 會根 clone_flags 來決定「copy what」。

那麼我怎麼知道 clone_flags 有哪些值?這個部份定義在 <linux/sched.h> 標頭檔裡,以下是 clone_flags 的 bitwise 值定義:

 * cloning flags:
#define CSIGNAL		0x000000ff	/* signal mask to be sent at exit */
#define CLONE_VM		0x00000100	/* set if VM shared between processes */
#define CLONE_FS		0x00000200	/* set if fs info shared between processes */
#define CLONE_FILES	0x00000400	/* set if open files shared between processes */
#define CLONE_SIGHAND	0x00000800	/* set if signal handlers and blocked signals shared */
#define CLONE_PTRACE	0x00002000	/* set if we want to let tracing continue on the child too */
#define CLONE_VFORK	0x00004000	/* set if the parent wants the child to wake it up on mm_release */
#define CLONE_PARENT	0x00008000	/* set if we want to have the same parent as the cloner */
#define CLONE_THREAD	0x00010000	/* Same thread group? */
#define CLONE_NEWNS	0x00020000	/* New namespace group? */
#define CLONE_SYSVSEM	0x00040000	/* share system V SEM_UNDO semantics */
#define CLONE_SETTLS	0x00080000	/* create a new TLS for the child */
#define CLONE_PARENT_SETTID	0x00100000	/* set the TID in the parent */
#define CLONE_CHILD_CLEARTID	0x00200000	/* clear the TID in the child */
#define CLONE_DETACHED		0x00400000	/* Unused, ignored */
#define CLONE_UNTRACED		0x00800000	/* set if the tracing process can't force CLONE_PTRACE on this clone */
#define CLONE_CHILD_SETTID		0x01000000	/* set the TID in the child */
#define CLONE_STOPPED		0x02000000	/* Start in stopped state */

 * List of flags we want to share for kernel threads,
 * if only because they are not used by them anyway.

sys_fork() 的實作來看:

asmlinkage int sys_fork(struct pt_regs regs)
	return do_fork(SIGCHLD, regs.esp, ®s, 0);

呼叫 fork() wrapper function 時,並無法讓 user 自行定義 clone flags;因此,「在學會 clone() 函數的用法前」,其實可以先暫時跳過 clone flags 這個部份。

到這裡是 sys_fork() 內部實作的 trace,雖然我們了解到 clone flags 的作用,但是由於 sys_fork() 並不指定此參數,所以先不討論 clone flags。不過,我們的 sys_fork() trace 功課還沒完成,下一篇日記將會是「Process Creation, #5:copy process」。

以上 kernel trace,皆使用 Linux 原始程式碼。

