前言 在相机断线重连的设计 中,因为CameraDevice::StartHealthCheckThread()
和CameraDevice::StopHealthCheckThread()
设计的不好,在相机重连成功的时候,程序会崩溃,并报出:
1 2 [2025-06-20 00:51:40,939](ERROR)<13169terminate called after throwing an instance of 'std::runtime_error' what(): camera [0] disconnected!
光从报错来看,好像是因为抛出的异常没被catch导致程序被std::terminate()
杀掉,但实际不是这样。
错误设计 先来看看旧的设计(仅展示直接导致错误的几个相关函数) 首先是断线重连事件处理器:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 class ReconnectHandler : public ConfigurationEventHandler{ public : void OnCameraDeviceRemoved (CameraDevice& camera) override { camera.StopGrabbing (); camera.CloseDevice (); camera.OpenDevice (); } void OnOpened (CameraDevice& camera) override { ... camera.StartHealthCheckThread (); } void OnClose (CameraDevice& camera) override { ... camera.StopHealthCheckThread (); } };
然后是StartHealthCheckThread()
和StopHealthCheckThread()
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 void CameraDevice::StartHealthCheckThread () { ... health_check_thread_ = std::thread ([this ]() { try { while (is_health_check_thread_running_) { SendHeartBeat (); std::this_thread::sleep_for (std::chrono::seconds (1 )); } } catch (const std::exception& e) { ... NotifyEventHandeler (ConfigurationEvent::OnCameraDeviceRemoved); return ; } }); } void CameraDevice::StopHealthCheckThread () { ... std::thread wait_thread ([this ]() { if (health_check_thread_.joinable()) { health_check_thread_.join(); } }) ; wait_thread.detach (); }
原因分析 画了一个简单的时序图描述线程间的关系:
根本原因是当相机重连成功后(即OnOpened()
被调用),就会串行调用了StartHealthCheckThread()
,这就导致了在旧线程未回收的情况下,赋予health_check_thread_
一个新线程资源,程序就会立即崩溃。
为了进一步确认原因,我将流程简化为一段测试代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 #include <iostream> #include <atomic> #include <thread> void StartHealthCheckThread () ;void StopHealthCheckThread () ;std::atomic<bool > is_health_check_thread_running_ (false ) ;std::thread health_check_thread_; void SendHeartBeat () { static int count = 0 ; count++; std::cout << "Sending heartbeat... Count: " << count << std::endl; if (count == 6 ) { throw std::runtime_error ("Simulated heartbeat failure" ); } } void OnCameraDeviceRemoved () { StopHealthCheckThread (); std::this_thread::sleep_for (std::chrono::seconds (5 )); StartHealthCheckThread (); } void StopHealthCheckThread () { std::cout << "CameraDevice::StopHealthCheckThread" << std::endl; is_health_check_thread_running_ = false ; std::thread wait_thread ([]() { if (health_check_thread_.joinable()) { health_check_thread_.join(); } std::cout << "CameraDevice::StopHealthCheckThread thread stopped" << std::endl; }) ; wait_thread.detach (); std::cout << "CameraDevice::StopHealthCheckThread thread detached" << std::endl; } void StartHealthCheckThread () { std::cout << "CameraDevice::StartHealthCheckThread" << std::endl; if (is_health_check_thread_running_) { return ; } is_health_check_thread_running_ = true ; health_check_thread_ = std::thread ([]() { std::cout << "entering health check thread..." << std::endl; try { while (is_health_check_thread_running_) { SendHeartBeat (); std::this_thread::sleep_for (std::chrono::seconds (1 )); } } catch (const std::exception& e) { std::cerr << "CameraDevice::StartHealthCheckThread exception: " << e.what () << std::endl; is_health_check_thread_running_ = false ; OnCameraDeviceRemoved (); return ; } }); } int main () { StartHealthCheckThread (); while (true ) { std::this_thread::sleep_for (std::chrono::seconds (1 )); } }
然后使用gdb调试,运行结果为:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". CameraDevice::StartHealthCheckThread [New Thread 0x7ffff798c700 (LWP 1305860)] entering health check thread... Sending heartbeat... Count: 1 Sending heartbeat... Count: 2 Sending heartbeat... Count: 3 Sending heartbeat... Count: 4 Sending heartbeat... Count: 5 Sending heartbeat... Count: 6 CameraDevice::StartHealthCheckThread exception: Simulated heartbeat failure CameraDevice::StopHealthCheckThread [New Thread 0x7ffff718b700 (LWP 1305862)] CameraDevice::StopHealthCheckThread thread detached CameraDevice::StartHealthCheckThread [New Thread 0x7ffff698a700 (LWP 1305866)] entering health check thread... Sending heartbeat... Count: 7 terminate called after throwing an instance of 'std::runtime_error' what(): Simulated heartbeat failure Thread 2 "test.out" received signal SIGABRT, Aborted. [Switching to Thread 0x7ffff798c700 (LWP 1305860)] __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
使用bt命令查看函数调用栈
1 2 3 4 5 6 7 8 9 10 11 12 # 0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 # 1 0x00007ffff7b03859 in __GI_abort () at abort.c:79 # 2 0x00007ffff7d9cee6 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6 # 3 0x00007ffff7daef8c in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6 # 4 0x00007ffff7daeff7 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6 # 5 0x00005555555571a3 in std::thread::operator= (this=0x55555555b280 <health_check_thread_>, __t=...) at /usr/include/c++/11/bits/std_thread.h:165 # 6 0x0000555555556834 in StartHealthCheckThread () at test.cpp:87 # 7 0x00005555555564f3 in OnCameraDeviceRemoved () at test.cpp:34 # 8 0x0000555555556765 in operator() (__closure=0x55555556e6c8) at test.cpp:84 # 9 0x0000555555556fe5 in std::__invoke_impl<void, StartHealthCheckThread()::<lambda()> >(std::__invoke_other, struct {...} &&) ( __f=...) at /usr/include/c++/11/bits/invoke.h:61
其中:
1 2 # 5 0x00005555555571a3 in std::thread::operator= (this=0x55555555b280 <health_check_thread_>, __t=...) at /usr/include/c++/11/bits/std_thread.h:165
表明当前执行的函数是std::thread::operator=
,也就是std::thread
的赋值运算符,正在给health_check_thread_
赋予一个新的std::thread
对象
1 # 4 0x00007ffff7daeff7 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6
表明在赋值内部发现health_check_thread_
管理着一个joinable的线程资源,因此赋值运算符内部就直接调用std::terminate()
结束程序了。
问题延伸 现在程序崩溃的原因找到了,但是还有一个问题,既然是通过std::terminate()
结束的程序,为什么最后的打印是在SendHeartBeat
抛出的std::runtime_error
?
1 2 terminate called after throwing an instance of 'std::runtime_error' what(): Simulated heartbeat failure
这涉及到异常对象生命周期的问题:catch块的代码执行完毕后, 在最后一个大括号结尾, 会将异常对象析构掉。但是此例中,catch块没有执行完毕就触发了std::terminate()
,也就是说异常对象仍处于活动状态。C++ runtime检测到此时仍然存在一个活动异常(std::terminate()
内部会检查当前是否有活动异常,并决定打印哪个信息)——也就是之前抛出的std::runtime_error
,于是就会打印包含异常类型和what()
信息的输出:
1 2 terminate called after throwing an instance of 'std::runtime_error' what(): Simulated heartbeat failure
我们可以编写一个简单的示例来验证:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 void TerminateFunc () { std::terminate (); } void TryFunc () { try { throw std::runtime_error ("Simulated error in main function" ); } catch (const std::exception& e) { std::terminate (); } } int main () { TryFunc (); return 0 ; }
当运行TryFunc()
,就会打印
1 2 terminate called after throwing an instance of 'std::runtime_error' what(): Simulated error in main function
当运行TerminateFunc()
,会打印
1 terminate called without an active exception
符合期望的结果
解决方案 要解决的根本问题就是要确保旧线程已被安全回收的情况下,再给std::thread
重新赋值。
所以要改的地方有两点:
异步通知OnCameraDeviceRemoved,这确保了旧心跳检测线程的及时结束
创建新心跳线程前,要判断health_check_thread_.joinable()
因此StartHealthCheckThread()
和StopHealthCheckThread()
修改为:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 void CameraDevice::StartHealthCheckThread () { ... { std::lock_guard<std::mutex> lock (health_check_mutex_) ; if (health_check_thread_.joinable ()) { health_check_thread_.join (); return ; } } health_check_thread_ = std::thread ([this ]() { try { while (is_health_check_thread_running_) { SendHeartBeat (); std::this_thread::sleep_for (std::chrono::seconds (1 )); } } catch (const std::exception& e) { ... std::thread ([this ](){ NotifyEventHandeler (ConfigurationEvent::OnCameraDeviceRemoved); }).detach (); return ; } }); } void CameraDevice::StopHealthCheckThread () { ... std::thread wait_thread ([this ]() { std::lock_guard<std::mutex> lock(health_check_mutex_); if (health_check_thread_.joinable()) { health_check_thread_.join(); } }) ; wait_thread.detach (); }
针对修改完的设计,我也绘制了一个简单的时序图