In the Linux kernel, the following vulnerability has been resolved:
clk: Get runtime PM before walking tree during disable_unused
Doug reported [1] the following hung task:
INFO: task swapper/0:1 blocked for more tha...Show moreIn the Linux kernel, the following vulnerability has been resolved:
clk: Get runtime PM before walking tree during disable_unused
Doug reported [1] the following hung task:
INFO: task swapper/0:1 blocked for more than 122 seconds.
Not tainted 5.15.149-21875-gf795ebc40eb8 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:swapper/0 state:D stack: 0 pid: 1 ppid: 0 flags:0x00000008
Call trace:
__switch_to+0xf4/0x1f4
__schedule+0x418/0xb80
schedule+0x5c/0x10c
rpm_resume+0xe0/0x52c
rpm_resume+0x178/0x52c
__pm_runtime_resume+0x58/0x98
clk_pm_runtime_get+0x30/0xb0
clk_disable_unused_subtree+0x58/0x208
clk_disable_unused_subtree+0x38/0x208
clk_disable_unused_subtree+0x38/0x208
clk_disable_unused_subtree+0x38/0x208
clk_disable_unused_subtree+0x38/0x208
clk_disable_unused+0x4c/0xe4
do_one_initcall+0xcc/0x2d8
do_initcall_level+0xa4/0x148
do_initcalls+0x5c/0x9c
do_basic_setup+0x24/0x30
kernel_init_freeable+0xec/0x164
kernel_init+0x28/0x120
ret_from_fork+0x10/0x20
INFO: task kworker/u16:0:9 blocked for more than 122 seconds.
Not tainted 5.15.149-21875-gf795ebc40eb8 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u16:0 state:D stack: 0 pid: 9 ppid: 2 flags:0x00000008
Workqueue: events_unbound deferred_probe_work_func
Call trace:
__switch_to+0xf4/0x1f4
__schedule+0x418/0xb80
schedule+0x5c/0x10c
schedule_preempt_disabled+0x2c/0x48
__mutex_lock+0x238/0x488
__mutex_lock_slowpath+0x1c/0x28
mutex_lock+0x50/0x74
clk_prepare_lock+0x7c/0x9c
clk_core_prepare_lock+0x20/0x44
clk_prepare+0x24/0x30
clk_bulk_prepare+0x40/0xb0
mdss_runtime_resume+0x54/0x1c8
pm_generic_runtime_resume+0x30/0x44
__genpd_runtime_resume+0x68/0x7c
genpd_runtime_resume+0x108/0x1f4
__rpm_callback+0x84/0x144
rpm_callback+0x30/0x88
rpm_resume+0x1f4/0x52c
rpm_resume+0x178/0x52c
__pm_runtime_resume+0x58/0x98
__device_attach+0xe0/0x170
device_initial_probe+0x1c/0x28
bus_probe_device+0x3c/0x9c
device_add+0x644/0x814
mipi_dsi_device_register_full+0xe4/0x170
devm_mipi_dsi_device_register_full+0x28/0x70
ti_sn_bridge_probe+0x1dc/0x2c0
auxiliary_bus_probe+0x4c/0x94
really_probe+0xcc/0x2c8
__driver_probe_device+0xa8/0x130
driver_probe_device+0x48/0x110
__device_attach_driver+0xa4/0xcc
bus_for_each_drv+0x8c/0xd8
__device_attach+0xf8/0x170
device_initial_probe+0x1c/0x28
bus_probe_device+0x3c/0x9c
deferred_probe_work_func+0x9c/0xd8
process_one_work+0x148/0x518
worker_thread+0x138/0x350
kthread+0x138/0x1e0
ret_from_fork+0x10/0x20
The first thread is walking the clk tree and calling
clk_pm_runtime_get() to power on devices required to read the clk
hardware via struct clk_ops::is_enabled(). This thread holds the clk
prepare_lock, and is trying to runtime PM resume a device, when it finds
that the device is in the process of resuming so the thread schedule()s
away waiting for the device to finish resuming before continuing. The
second thread is runtime PM resuming the same device, but the runtime
resume callback is calling clk_prepare(), trying to grab the
prepare_lock waiting on the first thread.
This is a classic ABBA deadlock. To properly fix the deadlock, we must
never runtime PM resume or suspend a device with the clk prepare_lock
held. Actually doing that is near impossible today because the global
prepare_lock would have to be dropped in the middle of the tree, the
device runtime PM resumed/suspended, and then the prepare_lock grabbed
again to ensure consistency of the clk tree topology. If anything
changes with the clk tree in the meantime, we've lost and will need to
start the operation all over again.
Luckily, most of the time we're simply incrementing or decrementing the
runtime PM count on an active device, so we don't have the chance to
schedule away with the prepare_lock held. Let's fix this immediate
problem that can be
---truncated---Show less |
In the Linux kernel, the following vulnerability has been resolved:
clk: Get runtime PM before walking tree for clk_summary
Similar to the previous commit, we should make sure that all devices are
runtime resumed befor...Show moreIn the Linux kernel, the following vulnerability has been resolved:
clk: Get runtime PM before walking tree for clk_summary
Similar to the previous commit, we should make sure that all devices are
runtime resumed before printing the clk_summary through debugfs. Failure
to do so would result in a deadlock if the thread is resuming a device
to print clk state and that device is also runtime resuming in another
thread, e.g the screen is turning on and the display driver is starting
up. We remove the calls to clk_pm_runtime_{get,put}() in this path
because they're superfluous now that we know the devices are runtime
resumed. This also squashes a bug where the return value of
clk_pm_runtime_get() wasn't checked, leading to an RPM count underflow
on error paths.Show less |
In the Linux kernel, the following vulnerability has been resolved:
clk: mediatek: Do a runtime PM get on controllers during probe
mt8183-mfgcfg has a mutual dependency with genpd during the probing
stage, which leads...Show moreIn the Linux kernel, the following vulnerability has been resolved:
clk: mediatek: Do a runtime PM get on controllers during probe
mt8183-mfgcfg has a mutual dependency with genpd during the probing
stage, which leads to a deadlock in the following call stack:
CPU0: genpd_lock --> clk_prepare_lock
genpd_power_off_work_fn()
genpd_lock()
generic_pm_domain::power_off()
clk_unprepare()
clk_prepare_lock()
CPU1: clk_prepare_lock --> genpd_lock
clk_register()
__clk_core_init()
clk_prepare_lock()
clk_pm_runtime_get()
genpd_lock()
Do a runtime PM get at the probe function to make sure clk_register()
won't acquire the genpd lock. Instead of only modifying mt8183-mfgcfg,
do this on all mediatek clock controller probings because we don't
believe this would cause any regression.
Verified on MT8183 and MT8192 Chromebooks.Show less |
In the Linux kernel, the following vulnerability has been resolved:
serial/pmac_zilog: Remove flawed mitigation for rx irq flood
The mitigation was intended to stop the irq completely. That may be
better than a hard lo...Show moreIn the Linux kernel, the following vulnerability has been resolved:
serial/pmac_zilog: Remove flawed mitigation for rx irq flood
The mitigation was intended to stop the irq completely. That may be
better than a hard lock-up but it turns out that you get a crash anyway
if you're using pmac_zilog as a serial console:
ttyPZ0: pmz: rx irq flood !
BUG: spinlock recursion on CPU#0, swapper/0
That's because the pr_err() call in pmz_receive_chars() results in
pmz_console_write() attempting to lock a spinlock already locked in
pmz_interrupt(). With CONFIG_DEBUG_SPINLOCK=y, this produces a fatal
BUG splat. The spinlock in question is the one in struct uart_port.
Even when it's not fatal, the serial port rx function ceases to work.
Also, the iteration limit doesn't play nicely with QEMU, as can be
seen in the bug report linked below.
A web search for other reports of the error message "pmz: rx irq flood"
didn't produce anything. So I don't think this code is needed any more.
Remove it.Show less |
In the Linux kernel, the following vulnerability has been resolved:
mm/memory-failure: fix deadlock when hugetlb_optimize_vmemmap is enabled
When I did hard offline test with hugetlb pages, below deadlock occurs:
====...Show moreIn the Linux kernel, the following vulnerability has been resolved:
mm/memory-failure: fix deadlock when hugetlb_optimize_vmemmap is enabled
When I did hard offline test with hugetlb pages, below deadlock occurs:
======================================================
WARNING: possible circular locking dependency detected
6.8.0-11409-gf6cef5f8c37f #1 Not tainted
------------------------------------------------------
bash/46904 is trying to acquire lock:
ffffffffabe68910 (cpu_hotplug_lock){++++}-{0:0}, at: static_key_slow_dec+0x16/0x60
but task is already holding lock:
ffffffffabf92ea8 (pcp_batch_high_lock){+.+.}-{3:3}, at: zone_pcp_disable+0x16/0x40
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (pcp_batch_high_lock){+.+.}-{3:3}:
__mutex_lock+0x6c/0x770
page_alloc_cpu_online+0x3c/0x70
cpuhp_invoke_callback+0x397/0x5f0
__cpuhp_invoke_callback_range+0x71/0xe0
_cpu_up+0xeb/0x210
cpu_up+0x91/0xe0
cpuhp_bringup_mask+0x49/0xb0
bringup_nonboot_cpus+0xb7/0xe0
smp_init+0x25/0xa0
kernel_init_freeable+0x15f/0x3e0
kernel_init+0x15/0x1b0
ret_from_fork+0x2f/0x50
ret_from_fork_asm+0x1a/0x30
-> #0 (cpu_hotplug_lock){++++}-{0:0}:
__lock_acquire+0x1298/0x1cd0
lock_acquire+0xc0/0x2b0
cpus_read_lock+0x2a/0xc0
static_key_slow_dec+0x16/0x60
__hugetlb_vmemmap_restore_folio+0x1b9/0x200
dissolve_free_huge_page+0x211/0x260
__page_handle_poison+0x45/0xc0
memory_failure+0x65e/0xc70
hard_offline_page_store+0x55/0xa0
kernfs_fop_write_iter+0x12c/0x1d0
vfs_write+0x387/0x550
ksys_write+0x64/0xe0
do_syscall_64+0xca/0x1e0
entry_SYSCALL_64_after_hwframe+0x6d/0x75
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(pcp_batch_high_lock);
lock(cpu_hotplug_lock);
lock(pcp_batch_high_lock);
rlock(cpu_hotplug_lock);
*** DEADLOCK ***
5 locks held by bash/46904:
#0: ffff98f6c3bb23f0 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x64/0xe0
#1: ffff98f6c328e488 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0xf8/0x1d0
#2: ffff98ef83b31890 (kn->active#113){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x100/0x1d0
#3: ffffffffabf9db48 (mf_mutex){+.+.}-{3:3}, at: memory_failure+0x44/0xc70
#4: ffffffffabf92ea8 (pcp_batch_high_lock){+.+.}-{3:3}, at: zone_pcp_disable+0x16/0x40
stack backtrace:
CPU: 10 PID: 46904 Comm: bash Kdump: loaded Not tainted 6.8.0-11409-gf6cef5f8c37f #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x68/0xa0
check_noncircular+0x129/0x140
__lock_acquire+0x1298/0x1cd0
lock_acquire+0xc0/0x2b0
cpus_read_lock+0x2a/0xc0
static_key_slow_dec+0x16/0x60
__hugetlb_vmemmap_restore_folio+0x1b9/0x200
dissolve_free_huge_page+0x211/0x260
__page_handle_poison+0x45/0xc0
memory_failure+0x65e/0xc70
hard_offline_page_store+0x55/0xa0
kernfs_fop_write_iter+0x12c/0x1d0
vfs_write+0x387/0x550
ksys_write+0x64/0xe0
do_syscall_64+0xca/0x1e0
entry_SYSCALL_64_after_hwframe+0x6d/0x75
RIP: 0033:0x7fc862314887
Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
RSP: 002b:00007fff19311268 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007fc862314887
RDX: 000000000000000c RSI: 000056405645fe10 RDI: 0000000000000001
RBP: 000056405645fe10 R08: 00007fc8623d1460 R09: 000000007fffffff
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000c
R13: 00007fc86241b780 R14: 00007fc862417600 R15: 00007fc862416a00
In short, below scene breaks the
---truncated---Show less |
In the Linux kernel, the following vulnerability has been resolved:
dm-raid456, md/raid456: fix a deadlock for dm-raid456 while io concurrent with reshape
For raid456, if reshape is still in progress, then IO across re...Show moreIn the Linux kernel, the following vulnerability has been resolved:
dm-raid456, md/raid456: fix a deadlock for dm-raid456 while io concurrent with reshape
For raid456, if reshape is still in progress, then IO across reshape
position will wait for reshape to make progress. However, for dm-raid,
in following cases reshape will never make progress hence IO will hang:
1) the array is read-only;
2) MD_RECOVERY_WAIT is set;
3) MD_RECOVERY_FROZEN is set;
After commit c467e97f079f ("md/raid6: use valid sector values to determine
if an I/O should wait on the reshape") fix the problem that IO across
reshape position doesn't wait for reshape, the dm-raid test
shell/lvconvert-raid-reshape.sh start to hang:
[root@fedora ~]# cat /proc/979/stack
[<0>] wait_woken+0x7d/0x90
[<0>] raid5_make_request+0x929/0x1d70 [raid456]
[<0>] md_handle_request+0xc2/0x3b0 [md_mod]
[<0>] raid_map+0x2c/0x50 [dm_raid]
[<0>] __map_bio+0x251/0x380 [dm_mod]
[<0>] dm_submit_bio+0x1f0/0x760 [dm_mod]
[<0>] __submit_bio+0xc2/0x1c0
[<0>] submit_bio_noacct_nocheck+0x17f/0x450
[<0>] submit_bio_noacct+0x2bc/0x780
[<0>] submit_bio+0x70/0xc0
[<0>] mpage_readahead+0x169/0x1f0
[<0>] blkdev_readahead+0x18/0x30
[<0>] read_pages+0x7c/0x3b0
[<0>] page_cache_ra_unbounded+0x1ab/0x280
[<0>] force_page_cache_ra+0x9e/0x130
[<0>] page_cache_sync_ra+0x3b/0x110
[<0>] filemap_get_pages+0x143/0xa30
[<0>] filemap_read+0xdc/0x4b0
[<0>] blkdev_read_iter+0x75/0x200
[<0>] vfs_read+0x272/0x460
[<0>] ksys_read+0x7a/0x170
[<0>] __x64_sys_read+0x1c/0x30
[<0>] do_syscall_64+0xc6/0x230
[<0>] entry_SYSCALL_64_after_hwframe+0x6c/0x74
This is because reshape can't make progress.
For md/raid, the problem doesn't exist because register new sync_thread
doesn't rely on the IO to be done any more:
1) If array is read-only, it can switch to read-write by ioctl/sysfs;
2) md/raid never set MD_RECOVERY_WAIT;
3) If MD_RECOVERY_FROZEN is set, mddev_suspend() doesn't hold
'reconfig_mutex', hence it can be cleared and reshape can continue by
sysfs api 'sync_action'.
However, I'm not sure yet how to avoid the problem in dm-raid yet. This
patch on the one hand make sure raid_message() can't change
sync_thread() through raid_message() after presuspend(), on the other
hand detect the above 3 cases before wait for IO do be done in
dm_suspend(), and let dm-raid requeue those IO.Show less |
In the Linux kernel, the following vulnerability has been resolved:
USB: core: Fix deadlock in usb_deauthorize_interface()
Among the attribute file callback routines in
drivers/usb/core/sysfs.c, the interface_authorize...Show moreIn the Linux kernel, the following vulnerability has been resolved:
USB: core: Fix deadlock in usb_deauthorize_interface()
Among the attribute file callback routines in
drivers/usb/core/sysfs.c, the interface_authorized_store() function is
the only one which acquires a device lock on an ancestor device: It
calls usb_deauthorize_interface(), which locks the interface's parent
USB device.
The will lead to deadlock if another process already owns that lock
and tries to remove the interface, whether through a configuration
change or because the device has been disconnected. As part of the
removal procedure, device_del() waits for all ongoing sysfs attribute
callbacks to complete. But usb_deauthorize_interface() can't complete
until the device lock has been released, and the lock won't be
released until the removal has finished.
The mechanism provided by sysfs to prevent this kind of deadlock is
to use the sysfs_break_active_protection() function, which tells sysfs
not to wait for the attribute callback.
Reported-and-tested by: Yue Sun <samsun1006219@gmail.com>
Reported by: xingwei lee <xrivendell7@gmail.com>Show less |
In the Linux kernel, the following vulnerability has been resolved:
USB: core: Fix deadlock in port "disable" sysfs attribute
The show and store callback routines for the "disable" sysfs attribute
file in port.c acquir...Show moreIn the Linux kernel, the following vulnerability has been resolved:
USB: core: Fix deadlock in port "disable" sysfs attribute
The show and store callback routines for the "disable" sysfs attribute
file in port.c acquire the device lock for the port's parent hub
device. This can cause problems if another process has locked the hub
to remove it or change its configuration:
Removing the hub or changing its configuration requires the
hub interface to be removed, which requires the port device
to be removed, and device_del() waits until all outstanding
sysfs attribute callbacks for the ports have returned. The
lock can't be released until then.
But the disable_show() or disable_store() routine can't return
until after it has acquired the lock.
The resulting deadlock can be avoided by calling
sysfs_break_active_protection(). This will cause the sysfs core not
to wait for the attribute's callback routine to return, allowing the
removal to proceed. The disadvantage is that after making this call,
there is no guarantee that the hub structure won't be deallocated at
any moment. To prevent this, we have to acquire a reference to it
first by calling hub_get().Show less |
In the Linux kernel, the following vulnerability has been resolved:
drm/gma500: Fix BUG: sleeping function called from invalid context errors
gma_crtc_page_flip() was holding the event_lock spinlock while calling
crtc_...Show moreIn the Linux kernel, the following vulnerability has been resolved:
drm/gma500: Fix BUG: sleeping function called from invalid context errors
gma_crtc_page_flip() was holding the event_lock spinlock while calling
crtc_funcs->mode_set_base() which takes ww_mutex.
The only reason to hold event_lock is to clear gma_crtc->page_flip_event
on mode_set_base() errors.
Instead unlock it after setting gma_crtc->page_flip_event and on
errors re-take the lock and clear gma_crtc->page_flip_event it
it is still set.
This fixes the following WARN/stacktrace:
[ 512.122953] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:870
[ 512.123004] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1253, name: gnome-shell
[ 512.123031] preempt_count: 1, expected: 0
[ 512.123048] RCU nest depth: 0, expected: 0
[ 512.123066] INFO: lockdep is turned off.
[ 512.123080] irq event stamp: 0
[ 512.123094] hardirqs last enabled at (0): [<0000000000000000>] 0x0
[ 512.123134] hardirqs last disabled at (0): [<ffffffff8d0ec28c>] copy_process+0x9fc/0x1de0
[ 512.123176] softirqs last enabled at (0): [<ffffffff8d0ec28c>] copy_process+0x9fc/0x1de0
[ 512.123207] softirqs last disabled at (0): [<0000000000000000>] 0x0
[ 512.123233] Preemption disabled at:
[ 512.123241] [<0000000000000000>] 0x0
[ 512.123275] CPU: 3 PID: 1253 Comm: gnome-shell Tainted: G W 5.19.0+ #1
[ 512.123304] Hardware name: Packard Bell dot s/SJE01_CT, BIOS V1.10 07/23/2013
[ 512.123323] Call Trace:
[ 512.123346] <TASK>
[ 512.123370] dump_stack_lvl+0x5b/0x77
[ 512.123412] __might_resched.cold+0xff/0x13a
[ 512.123458] ww_mutex_lock+0x1e/0xa0
[ 512.123495] psb_gem_pin+0x2c/0x150 [gma500_gfx]
[ 512.123601] gma_pipe_set_base+0x76/0x240 [gma500_gfx]
[ 512.123708] gma_crtc_page_flip+0x95/0x130 [gma500_gfx]
[ 512.123808] drm_mode_page_flip_ioctl+0x57d/0x5d0
[ 512.123897] ? drm_mode_cursor2_ioctl+0x10/0x10
[ 512.123936] drm_ioctl_kernel+0xa1/0x150
[ 512.123984] drm_ioctl+0x21f/0x420
[ 512.124025] ? drm_mode_cursor2_ioctl+0x10/0x10
[ 512.124070] ? rcu_read_lock_bh_held+0xb/0x60
[ 512.124104] ? lock_release+0x1ef/0x2d0
[ 512.124161] __x64_sys_ioctl+0x8d/0xd0
[ 512.124203] do_syscall_64+0x58/0x80
[ 512.124239] ? do_syscall_64+0x67/0x80
[ 512.124267] ? trace_hardirqs_on_prepare+0x55/0xe0
[ 512.124300] ? do_syscall_64+0x67/0x80
[ 512.124340] ? rcu_read_lock_sched_held+0x10/0x80
[ 512.124377] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 512.124411] RIP: 0033:0x7fcc4a70740f
[ 512.124442] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00
[ 512.124470] RSP: 002b:00007ffda73f5390 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 512.124503] RAX: ffffffffffffffda RBX: 000055cc9e474500 RCX: 00007fcc4a70740f
[ 512.124524] RDX: 00007ffda73f5420 RSI: 00000000c01864b0 RDI: 0000000000000009
[ 512.124544] RBP: 00007ffda73f5420 R08: 000055cc9c0b0cb0 R09: 0000000000000034
[ 512.124564] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000c01864b0
[ 512.124584] R13: 0000000000000009 R14: 000055cc9df484d0 R15: 000055cc9af5d0c0
[ 512.124647] </TASK>Show less |
Vyper is a pythonic Smart Contract Language for the Ethereum virtual machine. Prior to version 0.3.0, default functions don't respect nonreentrancy keys and the lock isn't emitted. No vulnerable production contracts were...Show moreVyper is a pythonic Smart Contract Language for the Ethereum virtual machine. Prior to version 0.3.0, default functions don't respect nonreentrancy keys and the lock isn't emitted. No vulnerable production contracts were found. Additionally, using a lock on a `default` function is a very sparsely used pattern. As such, the impact is low. Version 0.3.0 contains a patch for the issue.
Show less |
In the Linux kernel, the following vulnerability has been resolved:
netfilter: nf_tables: release mutex after nft_gc_seq_end from abort path
The commit mutex should not be released during the critical section
between n...Show moreIn the Linux kernel, the following vulnerability has been resolved:
netfilter: nf_tables: release mutex after nft_gc_seq_end from abort path
The commit mutex should not be released during the critical section
between nft_gc_seq_begin() and nft_gc_seq_end(), otherwise, async GC
worker could collect expired objects and get the released commit lock
within the same GC sequence.
nf_tables_module_autoload() temporarily releases the mutex to load
module dependencies, then it goes back to replay the transaction again.
Move it at the end of the abort phase after nft_gc_seq_end() is called.Show less |
In the Linux kernel, the following vulnerability has been resolved:
Revert "drm/amd: flush any delayed gfxoff on suspend entry"
commit ab4750332dbe ("drm/amdgpu/sdma5.2: add begin/end_use ring
callbacks") caused GFXOFF...Show moreIn the Linux kernel, the following vulnerability has been resolved:
Revert "drm/amd: flush any delayed gfxoff on suspend entry"
commit ab4750332dbe ("drm/amdgpu/sdma5.2: add begin/end_use ring
callbacks") caused GFXOFF control to be used more heavily and the
codepath that was removed from commit 0dee72639533 ("drm/amd: flush any
delayed gfxoff on suspend entry") now can be exercised at suspend again.
Users report that by using GNOME to suspend the lockscreen trigger will
cause SDMA traffic and the system can deadlock.
This reverts commit 0dee726395333fea833eaaf838bc80962df886c8.Show less |
In the Linux kernel, the following vulnerability has been resolved:
block: fix deadlock between bd_link_disk_holder and partition scan
'open_mutex' of gendisk is used to protect open/close block devices. But
in bd_link...Show moreIn the Linux kernel, the following vulnerability has been resolved:
block: fix deadlock between bd_link_disk_holder and partition scan
'open_mutex' of gendisk is used to protect open/close block devices. But
in bd_link_disk_holder(), it is used to protect the creation of symlink
between holding disk and slave bdev, which introduces some issues.
When bd_link_disk_holder() is called, the driver is usually in the process
of initialization/modification and may suspend submitting io. At this
time, any io hold 'open_mutex', such as scanning partitions, can cause
deadlocks. For example, in raid:
T1 T2
bdev_open_by_dev
lock open_mutex [1]
...
efi_partition
...
md_submit_bio
md_ioctl mddev_syspend
-> suspend all io
md_add_new_disk
bind_rdev_to_array
bd_link_disk_holder
try lock open_mutex [2]
md_handle_request
-> wait mddev_resume
T1 scan partition, T2 add a new device to raid. T1 waits for T2 to resume
mddev, but T2 waits for open_mutex held by T1. Deadlock occurs.
Fix it by introducing a local mutex 'blk_holder_mutex' to replace
'open_mutex'.Show less |
In the Linux kernel, the following vulnerability has been resolved:
scsi: hisi_sas: Fix a deadlock issue related to automatic dump
If we issue a disabling PHY command, the device attached with it will go
offline, if a...Show moreIn the Linux kernel, the following vulnerability has been resolved:
scsi: hisi_sas: Fix a deadlock issue related to automatic dump
If we issue a disabling PHY command, the device attached with it will go
offline, if a 2 bit ECC error occurs at the same time, a hung task may be
found:
[ 4613.652388] INFO: task kworker/u256:0:165233 blocked for more than 120 seconds.
[ 4613.666297] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4613.674809] task:kworker/u256:0 state:D stack: 0 pid:165233 ppid: 2 flags:0x00000208
[ 4613.683959] Workqueue: 0000:74:02.0_disco_q sas_revalidate_domain [libsas]
[ 4613.691518] Call trace:
[ 4613.694678] __switch_to+0xf8/0x17c
[ 4613.698872] __schedule+0x660/0xee0
[ 4613.703063] schedule+0xac/0x240
[ 4613.706994] schedule_timeout+0x500/0x610
[ 4613.711705] __down+0x128/0x36c
[ 4613.715548] down+0x240/0x2d0
[ 4613.719221] hisi_sas_internal_abort_timeout+0x1bc/0x260 [hisi_sas_main]
[ 4613.726618] sas_execute_internal_abort+0x144/0x310 [libsas]
[ 4613.732976] sas_execute_internal_abort_dev+0x44/0x60 [libsas]
[ 4613.739504] hisi_sas_internal_task_abort_dev.isra.0+0xbc/0x1b0 [hisi_sas_main]
[ 4613.747499] hisi_sas_dev_gone+0x174/0x250 [hisi_sas_main]
[ 4613.753682] sas_notify_lldd_dev_gone+0xec/0x2e0 [libsas]
[ 4613.759781] sas_unregister_common_dev+0x4c/0x7a0 [libsas]
[ 4613.765962] sas_destruct_devices+0xb8/0x120 [libsas]
[ 4613.771709] sas_do_revalidate_domain.constprop.0+0x1b8/0x31c [libsas]
[ 4613.778930] sas_revalidate_domain+0x60/0xa4 [libsas]
[ 4613.784716] process_one_work+0x248/0x950
[ 4613.789424] worker_thread+0x318/0x934
[ 4613.793878] kthread+0x190/0x200
[ 4613.797810] ret_from_fork+0x10/0x18
[ 4613.802121] INFO: task kworker/u256:4:316722 blocked for more than 120 seconds.
[ 4613.816026] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4613.824538] task:kworker/u256:4 state:D stack: 0 pid:316722 ppid: 2 flags:0x00000208
[ 4613.833670] Workqueue: 0000:74:02.0 hisi_sas_rst_work_handler [hisi_sas_main]
[ 4613.841491] Call trace:
[ 4613.844647] __switch_to+0xf8/0x17c
[ 4613.848852] __schedule+0x660/0xee0
[ 4613.853052] schedule+0xac/0x240
[ 4613.856984] schedule_timeout+0x500/0x610
[ 4613.861695] __down+0x128/0x36c
[ 4613.865542] down+0x240/0x2d0
[ 4613.869216] hisi_sas_controller_prereset+0x58/0x1fc [hisi_sas_main]
[ 4613.876324] hisi_sas_rst_work_handler+0x40/0x8c [hisi_sas_main]
[ 4613.883019] process_one_work+0x248/0x950
[ 4613.887732] worker_thread+0x318/0x934
[ 4613.892204] kthread+0x190/0x200
[ 4613.896118] ret_from_fork+0x10/0x18
[ 4613.900423] INFO: task kworker/u256:1:348985 blocked for more than 121 seconds.
[ 4613.914341] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4613.922852] task:kworker/u256:1 state:D stack: 0 pid:348985 ppid: 2 flags:0x00000208
[ 4613.931984] Workqueue: 0000:74:02.0_event_q sas_port_event_worker [libsas]
[ 4613.939549] Call trace:
[ 4613.942702] __switch_to+0xf8/0x17c
[ 4613.946892] __schedule+0x660/0xee0
[ 4613.951083] schedule+0xac/0x240
[ 4613.955015] schedule_timeout+0x500/0x610
[ 4613.959725] wait_for_common+0x200/0x610
[ 4613.964349] wait_for_completion+0x3c/0x5c
[ 4613.969146] flush_workqueue+0x198/0x790
[ 4613.973776] sas_porte_broadcast_rcvd+0x1e8/0x320 [libsas]
[ 4613.979960] sas_port_event_worker+0x54/0xa0 [libsas]
[ 4613.985708] process_one_work+0x248/0x950
[ 4613.990420] worker_thread+0x318/0x934
[ 4613.994868] kthread+0x190/0x200
[ 4613.998800] ret_from_fork+0x10/0x18
This is because when the device goes offline, we obtain the hisi_hba
semaphore and send the ABORT_DEV command to the device. However, the
internal abort timed out due to the 2 bit ECC error and triggers automatic
dump. In addition, since the hisi_hba semaphore has been obtained, the dump
cannot be executed and the controller cannot be reset.
Therefore, the deadlocks occur on the following circular dependencies
---truncated---Show less |
In the Linux kernel, the following vulnerability has been resolved:
scsi: core: sysfs: Fix hang when device state is set via sysfs
This fixes a regression added with:
commit f0f82e2476f6 ("scsi: core: Fix capacity set...Show moreIn the Linux kernel, the following vulnerability has been resolved:
scsi: core: sysfs: Fix hang when device state is set via sysfs
This fixes a regression added with:
commit f0f82e2476f6 ("scsi: core: Fix capacity set to zero after
offlinining device")
The problem is that after iSCSI recovery, iscsid will call into the kernel
to set the dev's state to running, and with that patch we now call
scsi_rescan_device() with the state_mutex held. If the SCSI error handler
thread is just starting to test the device in scsi_send_eh_cmnd() then it's
going to try to grab the state_mutex.
We are then stuck, because when scsi_rescan_device() tries to send its I/O
scsi_queue_rq() calls -> scsi_host_queue_ready() -> scsi_host_in_recovery()
which will return true (the host state is still in recovery) and I/O will
just be requeued. scsi_send_eh_cmnd() will then never be able to grab the
state_mutex to finish error handling.
To prevent the deadlock move the rescan-related code to after we drop the
state_mutex.
This also adds a check for if we are already in the running state. This
prevents extra scans and helps the iscsid case where if the transport class
has already onlined the device during its recovery process then we don't
need userspace to do it again plus possibly block that daemon.Show less |
In the Linux kernel, the following vulnerability has been resolved:
tty: tty_buffer: Fix the softlockup issue in flush_to_ldisc
When running ltp testcase(ltp/testcases/kernel/pty/pty04.c) with arm64, there is a soft lo...Show moreIn the Linux kernel, the following vulnerability has been resolved:
tty: tty_buffer: Fix the softlockup issue in flush_to_ldisc
When running ltp testcase(ltp/testcases/kernel/pty/pty04.c) with arm64, there is a soft lockup,
which look like this one:
Workqueue: events_unbound flush_to_ldisc
Call trace:
dump_backtrace+0x0/0x1ec
show_stack+0x24/0x30
dump_stack+0xd0/0x128
panic+0x15c/0x374
watchdog_timer_fn+0x2b8/0x304
__run_hrtimer+0x88/0x2c0
__hrtimer_run_queues+0xa4/0x120
hrtimer_interrupt+0xfc/0x270
arch_timer_handler_phys+0x40/0x50
handle_percpu_devid_irq+0x94/0x220
__handle_domain_irq+0x88/0xf0
gic_handle_irq+0x84/0xfc
el1_irq+0xc8/0x180
slip_unesc+0x80/0x214 [slip]
tty_ldisc_receive_buf+0x64/0x80
tty_port_default_receive_buf+0x50/0x90
flush_to_ldisc+0xbc/0x110
process_one_work+0x1d4/0x4b0
worker_thread+0x180/0x430
kthread+0x11c/0x120
In the testcase pty04, The first process call the write syscall to send
data to the pty master. At the same time, the workqueue will do the
flush_to_ldisc to pop data in a loop until there is no more data left.
When the sender and workqueue running in different core, the sender sends
data fastly in full time which will result in workqueue doing work in loop
for a long time and occuring softlockup in flush_to_ldisc with kernel
configured without preempt. So I add need_resched check and cond_resched
in the flush_to_ldisc loop to avoid it.Show less |
In the Linux kernel, the following vulnerability has been resolved:
spi: cadence-qspi: remove system-wide suspend helper calls from runtime PM hooks
The ->runtime_suspend() and ->runtime_resume() callbacks are not
expe...Show moreIn the Linux kernel, the following vulnerability has been resolved:
spi: cadence-qspi: remove system-wide suspend helper calls from runtime PM hooks
The ->runtime_suspend() and ->runtime_resume() callbacks are not
expected to call spi_controller_suspend() and spi_controller_resume().
Remove calls to those in the cadence-qspi driver.
Those helpers have two roles currently:
- They stop/start the queue, including dealing with the kworker.
- They toggle the SPI controller SPI_CONTROLLER_SUSPENDED flag. It
requires acquiring ctlr->bus_lock_mutex.
Step one is irrelevant because cadence-qspi is not queued. Step two
however has two implications:
- A deadlock occurs, because ->runtime_resume() is called in a context
where the lock is already taken (in the ->exec_op() callback, where
the usage count is incremented).
- It would disallow all operations once the device is auto-suspended.
Here is a brief call tree highlighting the mutex deadlock:
spi_mem_exec_op()
...
spi_mem_access_start()
mutex_lock(&ctlr->bus_lock_mutex)
cqspi_exec_mem_op()
pm_runtime_resume_and_get()
cqspi_resume()
spi_controller_resume()
mutex_lock(&ctlr->bus_lock_mutex)
...
spi_mem_access_end()
mutex_unlock(&ctlr->bus_lock_mutex)
...Show less |
In the Linux kernel, the following vulnerability has been resolved:
dmaengine: fsl-qdma: fix SoC may hang on 16 byte unaligned read
There is chip (ls1028a) errata:
The SoC may hang on 16 byte unaligned read transactio...Show moreIn the Linux kernel, the following vulnerability has been resolved:
dmaengine: fsl-qdma: fix SoC may hang on 16 byte unaligned read
There is chip (ls1028a) errata:
The SoC may hang on 16 byte unaligned read transactions by QDMA.
Unaligned read transactions initiated by QDMA may stall in the NOC
(Network On-Chip), causing a deadlock condition. Stalled transactions will
trigger completion timeouts in PCIe controller.
Workaround:
Enable prefetch by setting the source descriptor prefetchable bit
( SD[PF] = 1 ).
Implement this workaround.Show less |
In the Linux kernel, the following vulnerability has been resolved:
mptcp: fix possible deadlock in subflow diag
Syzbot and Eric reported a lockdep splat in the subflow diag:
WARNING: possible circular locking depe...Show moreIn the Linux kernel, the following vulnerability has been resolved:
mptcp: fix possible deadlock in subflow diag
Syzbot and Eric reported a lockdep splat in the subflow diag:
WARNING: possible circular locking dependency detected
6.8.0-rc4-syzkaller-00212-g40b9385dd8e6 #0 Not tainted
syz-executor.2/24141 is trying to acquire lock:
ffff888045870130 (k-sk_lock-AF_INET6){+.+.}-{0:0}, at:
tcp_diag_put_ulp net/ipv4/tcp_diag.c:100 [inline]
ffff888045870130 (k-sk_lock-AF_INET6){+.+.}-{0:0}, at:
tcp_diag_get_aux+0x738/0x830 net/ipv4/tcp_diag.c:137
but task is already holding lock:
ffffc9000135e488 (&h->lhash2[i].lock){+.+.}-{2:2}, at: spin_lock
include/linux/spinlock.h:351 [inline]
ffffc9000135e488 (&h->lhash2[i].lock){+.+.}-{2:2}, at:
inet_diag_dump_icsk+0x39f/0x1f80 net/ipv4/inet_diag.c:1038
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&h->lhash2[i].lock){+.+.}-{2:2}:
lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754
__raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
_raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
spin_lock include/linux/spinlock.h:351 [inline]
__inet_hash+0x335/0xbe0 net/ipv4/inet_hashtables.c:743
inet_csk_listen_start+0x23a/0x320 net/ipv4/inet_connection_sock.c:1261
__inet_listen_sk+0x2a2/0x770 net/ipv4/af_inet.c:217
inet_listen+0xa3/0x110 net/ipv4/af_inet.c:239
rds_tcp_listen_init+0x3fd/0x5a0 net/rds/tcp_listen.c:316
rds_tcp_init_net+0x141/0x320 net/rds/tcp.c:577
ops_init+0x352/0x610 net/core/net_namespace.c:136
__register_pernet_operations net/core/net_namespace.c:1214 [inline]
register_pernet_operations+0x2cb/0x660 net/core/net_namespace.c:1283
register_pernet_device+0x33/0x80 net/core/net_namespace.c:1370
rds_tcp_init+0x62/0xd0 net/rds/tcp.c:735
do_one_initcall+0x238/0x830 init/main.c:1236
do_initcall_level+0x157/0x210 init/main.c:1298
do_initcalls+0x3f/0x80 init/main.c:1314
kernel_init_freeable+0x42f/0x5d0 init/main.c:1551
kernel_init+0x1d/0x2a0 init/main.c:1441
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:242
-> #0 (k-sk_lock-AF_INET6){+.+.}-{0:0}:
check_prev_add kernel/locking/lockdep.c:3134 [inline]
check_prevs_add kernel/locking/lockdep.c:3253 [inline]
validate_chain+0x18ca/0x58e0 kernel/locking/lockdep.c:3869
__lock_acquire+0x1345/0x1fd0 kernel/locking/lockdep.c:5137
lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754
lock_sock_fast include/net/sock.h:1723 [inline]
subflow_get_info+0x166/0xd20 net/mptcp/diag.c:28
tcp_diag_put_ulp net/ipv4/tcp_diag.c:100 [inline]
tcp_diag_get_aux+0x738/0x830 net/ipv4/tcp_diag.c:137
inet_sk_diag_fill+0x10ed/0x1e00 net/ipv4/inet_diag.c:345
inet_diag_dump_icsk+0x55b/0x1f80 net/ipv4/inet_diag.c:1061
__inet_diag_dump+0x211/0x3a0 net/ipv4/inet_diag.c:1263
inet_diag_dump_compat+0x1c1/0x2d0 net/ipv4/inet_diag.c:1371
netlink_dump+0x59b/0xc80 net/netlink/af_netlink.c:2264
__netlink_dump_start+0x5df/0x790 net/netlink/af_netlink.c:2370
netlink_dump_start include/linux/netlink.h:338 [inline]
inet_diag_rcv_msg_compat+0x209/0x4c0 net/ipv4/inet_diag.c:1405
sock_diag_rcv_msg+0xe7/0x410
netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2543
sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:280
netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline]
netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1367
netlink_sendmsg+0xa3b/0xd70 net/netlink/af_netlink.c:1908
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x221/0x270 net/socket.c:745
____sys_sendmsg+0x525/0x7d0 net/socket.c:2584
___sys_sendmsg net/socket.c:2638 [inline]
__sys_sendmsg+0x2b0/0x3a0 net/socket.c:2667
do_syscall_64+0xf9/0x240
entry_SYSCALL_64_after_hwframe+0x6f/0x77
As noted by Eric we can break the lock dependency chain avoid
dumping
---truncated---Show less |
In the Linux kernel, the following vulnerability has been resolved:
aoe: avoid potential deadlock at set_capacity
Move set_capacity() outside of the section procected by (&d->lock).
To avoid possible interrupt unsafe l...Show moreIn the Linux kernel, the following vulnerability has been resolved:
aoe: avoid potential deadlock at set_capacity
Move set_capacity() outside of the section procected by (&d->lock).
To avoid possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
[1] lock(&bdev->bd_size_lock);
local_irq_disable();
[2] lock(&d->lock);
[3] lock(&bdev->bd_size_lock);
<Interrupt>
[4] lock(&d->lock);
*** DEADLOCK ***
Where [1](&bdev->bd_size_lock) hold by zram_add()->set_capacity().
[2]lock(&d->lock) hold by aoeblk_gdalloc(). And aoeblk_gdalloc()
is trying to acquire [3](&bdev->bd_size_lock) at set_capacity() call.
In this situation an attempt to acquire [4]lock(&d->lock) from
aoecmd_cfg_rsp() will lead to deadlock.
So the simplest solution is breaking lock dependency
[2](&d->lock) -> [3](&bdev->bd_size_lock) by moving set_capacity()
outside.Show less |