joinable的线程对象重新赋值引发的崩溃分析

paw5zx Lv4

前言

相机断线重连的设计中,因为CameraDevice::StartHealthCheckThread()CameraDevice::StopHealthCheckThread()设计的不好,在相机重连成功的时候,程序会崩溃,并报出:

1
2
[2025-06-20 00:51:40,939](ERROR)<13169terminate called after throwing an instance of 'std::runtime_error'
what(): camera [0] disconnected!

光从报错来看,好像是因为抛出的异常没被catch导致程序被std::terminate()杀掉,但实际不是这样。

错误设计

先来看看旧的设计(仅展示直接导致错误的几个相关函数)
首先是断线重连事件处理器:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
class ReconnectHandler : public ConfigurationEventHandler
{
public:
void OnCameraDeviceRemoved(CameraDevice& camera) override
{
camera.StopGrabbing();
camera.CloseDevice();

// 重连实现细节不展示,本质就是循环Open相机
camera.OpenDevice();
}
// 在相机打开成功后被同步调用
void OnOpened(CameraDevice& camera) override
{
...
camera.StartHealthCheckThread();
}

// 在相机关闭动作开始前被同步调用
void OnClose(CameraDevice& camera) override
{
...
camera.StopHealthCheckThread();
}
};

然后是StartHealthCheckThread()StopHealthCheckThread()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
void CameraDevice::StartHealthCheckThread()
{
// 状态检查
...
// 心跳线程
health_check_thread_ = std::thread([this]()
{
try
{
while(is_health_check_thread_running_)
{
// 相机断连后SendHeartBeat()会抛出runtime_error:"camera [x] disconnected!"
SendHeartBeat();
std::this_thread::sleep_for(std::chrono::seconds(1));
}
}
catch(const std::exception& e)
{
// 状态重置
...

NotifyEventHandeler(ConfigurationEvent::OnCameraDeviceRemoved);
return;
}
});
}

void CameraDevice::StopHealthCheckThread()
{
// 状态重置
...

std::thread wait_thread([this]()
{
if (health_check_thread_.joinable())
{
health_check_thread_.join();
}
});
wait_thread.detach();
}

原因分析

画了一个简单的时序图描述线程间的关系:

根本原因是当相机重连成功后(即OnOpened()被调用),就会串行调用了StartHealthCheckThread(),这就导致了在旧线程未回收的情况下,赋予health_check_thread_一个新线程资源,程序就会立即崩溃。

为了进一步确认原因,我将流程简化为一段测试代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
#include <iostream>
#include <atomic>
#include <thread>

void StartHealthCheckThread();
void StopHealthCheckThread();

std::atomic<bool> is_health_check_thread_running_(false);
std::thread health_check_thread_;

void SendHeartBeat()
{
static int count = 0;
count++;
std::cout << "Sending heartbeat... Count: " << count << std::endl;
// if(count > 5)
// 使用count==6是确保重连后调用SendHeartBeat不会主动抛出异常
if(count == 6) // Simulate an error after 5 heartbeats
{
throw std::runtime_error("Simulated heartbeat failure");
}
}

void OnCameraDeviceRemoved()
{
StopHealthCheckThread();

// 模拟重连过程的耗时
std::this_thread::sleep_for(std::chrono::seconds(5));

StartHealthCheckThread();
}

void StopHealthCheckThread()
{
std::cout << "CameraDevice::StopHealthCheckThread" << std::endl;
is_health_check_thread_running_ = false;
std::thread wait_thread([]()
{
if (health_check_thread_.joinable())
{
health_check_thread_.join();
}

std::cout << "CameraDevice::StopHealthCheckThread thread stopped" << std::endl;
});

wait_thread.detach(); // 非阻塞回收
std::cout << "CameraDevice::StopHealthCheckThread thread detached" << std::endl;
}


void StartHealthCheckThread()
{
std::cout << "CameraDevice::StartHealthCheckThread" << std::endl;

if(is_health_check_thread_running_)
{
return;
}

is_health_check_thread_running_ = true;
//心跳线程
health_check_thread_ = std::thread([]()
{
std::cout << "entering health check thread..." << std::endl;
try
{
while(is_health_check_thread_running_)
{
// CRRC_INFO("this is thread 2");
SendHeartBeat();
std::this_thread::sleep_for(std::chrono::seconds(1));
}
}
catch(const std::exception& e)
{
std::cerr << "CameraDevice::StartHealthCheckThread exception: " << e.what() << std::endl;
is_health_check_thread_running_ = false;

OnCameraDeviceRemoved();
return;
}
});
}

int main()
{
StartHealthCheckThread();
while(true)
{
std::this_thread::sleep_for(std::chrono::seconds(1));
}
}

然后使用gdb调试,运行结果为:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
CameraDevice::StartHealthCheckThread
[New Thread 0x7ffff798c700 (LWP 1305860)]
entering health check thread...
Sending heartbeat... Count: 1
Sending heartbeat... Count: 2
Sending heartbeat... Count: 3
Sending heartbeat... Count: 4
Sending heartbeat... Count: 5
Sending heartbeat... Count: 6
CameraDevice::StartHealthCheckThread exception: Simulated heartbeat failure
CameraDevice::StopHealthCheckThread
[New Thread 0x7ffff718b700 (LWP 1305862)]
CameraDevice::StopHealthCheckThread thread detached
CameraDevice::StartHealthCheckThread
[New Thread 0x7ffff698a700 (LWP 1305866)]
entering health check thread...
Sending heartbeat... Count: 7
terminate called after throwing an instance of 'std::runtime_error'
what(): Simulated heartbeat failure

Thread 2 "test.out" received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffff798c700 (LWP 1305860)]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50

使用bt命令查看函数调用栈

1
2
3
4
5
6
7
8
9
10
11
12
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007ffff7b03859 in __GI_abort () at abort.c:79
#2 0x00007ffff7d9cee6 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#3 0x00007ffff7daef8c in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#4 0x00007ffff7daeff7 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6
#5 0x00005555555571a3 in std::thread::operator= (this=0x55555555b280 <health_check_thread_>, __t=...)
at /usr/include/c++/11/bits/std_thread.h:165
#6 0x0000555555556834 in StartHealthCheckThread () at test.cpp:87
#7 0x00005555555564f3 in OnCameraDeviceRemoved () at test.cpp:34
#8 0x0000555555556765 in operator() (__closure=0x55555556e6c8) at test.cpp:84
#9 0x0000555555556fe5 in std::__invoke_impl<void, StartHealthCheckThread()::<lambda()> >(std::__invoke_other, struct {...} &&) (
__f=...) at /usr/include/c++/11/bits/invoke.h:61

其中:

1
2
#5  0x00005555555571a3 in std::thread::operator= (this=0x55555555b280 <health_check_thread_>, __t=...)
at /usr/include/c++/11/bits/std_thread.h:165

表明当前执行的函数是std::thread::operator=,也就是std::thread的赋值运算符,正在给health_check_thread_赋予一个新的std::thread对象

1
#4  0x00007ffff7daeff7 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6

表明在赋值内部发现health_check_thread_管理着一个joinable的线程资源,因此赋值运算符内部就直接调用std::terminate()结束程序了。

问题延伸

现在程序崩溃的原因找到了,但是还有一个问题,既然是通过std::terminate()结束的程序,为什么最后的打印是在SendHeartBeat抛出的std::runtime_error

1
2
terminate called after throwing an instance of 'std::runtime_error'
what(): Simulated heartbeat failure

这涉及到异常对象生命周期的问题:catch块的代码执行完毕后, 在最后一个大括号结尾, 会将异常对象析构掉。但是此例中,catch块没有执行完毕就触发了std::terminate(),也就是说异常对象仍处于活动状态。C++ runtime检测到此时仍然存在一个活动异常(std::terminate()内部会检查当前是否有活动异常,并决定打印哪个信息)——也就是之前抛出的std::runtime_error,于是就会打印包含异常类型和what()信息的输出:

1
2
terminate called after throwing an instance of 'std::runtime_error'
what(): Simulated heartbeat failure

我们可以编写一个简单的示例来验证:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
void TerminateFunc()
{
std::terminate();
}

void TryFunc()
{
try
{
throw std::runtime_error("Simulated error in main function");
}
catch(const std::exception& e)
{
std::terminate();
}
}

int main()
{
// TerminateFunc();
TryFunc();

return 0;
}

当运行TryFunc(),就会打印

1
2
terminate called after throwing an instance of 'std::runtime_error'
what(): Simulated error in main function

当运行TerminateFunc(),会打印

1
terminate called without an active exception

符合期望的结果

解决方案

要解决的根本问题就是要确保旧线程已被安全回收的情况下,再给std::thread重新赋值。

所以要改的地方有两点:

  • 异步通知OnCameraDeviceRemoved,这确保了旧心跳检测线程的及时结束
  • 创建新心跳线程前,要判断health_check_thread_.joinable()

因此StartHealthCheckThread()StopHealthCheckThread()修改为:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
void CameraDevice::StartHealthCheckThread()
{
// 状态检查
...

{
// 防止double join
std::lock_guard<std::mutex> lock(health_check_mutex_);
if(health_check_thread_.joinable())
{
health_check_thread_.join();
return;
}
}
// 心跳线程
health_check_thread_ = std::thread([this]()
{
try
{
while(is_health_check_thread_running_)
{
// 相机断连后SendHeartBeat()会抛出runtime_error:"camera [x] disconnected!"
SendHeartBeat();
std::this_thread::sleep_for(std::chrono::seconds(1));
}
}
catch(const std::exception& e)
{
// 状态重置
...

std::thread([this](){
NotifyEventHandeler(ConfigurationEvent::OnCameraDeviceRemoved);
}).detach(); // 异步通知,避免阻塞当前线程
return;
}
});
}

void CameraDevice::StopHealthCheckThread()
{
// 状态重置
...

std::thread wait_thread([this]()
{
// 防止double join
std::lock_guard<std::mutex> lock(health_check_mutex_);
if (health_check_thread_.joinable())
{
health_check_thread_.join();
}
});
wait_thread.detach();
}

针对修改完的设计,我也绘制了一个简单的时序图

  • 标题: joinable的线程对象重新赋值引发的崩溃分析
  • 作者: paw5zx
  • 创建于 : 2025-07-23 15:12:19
  • 更新于 : 2025-07-23 23:01:54
  • 链接: https://paw5zx.github.io/issue-and-solution-01-cpp-thread-crash/
  • 版权声明: 本文章采用 CC BY-NC-SA 4.0 进行许可。
评论
目录
joinable的线程对象重新赋值引发的崩溃分析