Changelog in Linux kernel 6.1.112

 
ALSA: hda/realtek - Fixed ALC256 headphone no sound [+ + +]
Author: Kailang Yang <kailang@realtek.com>
Date:   Thu Aug 22 10:54:19 2024 +0800

    ALSA: hda/realtek - Fixed ALC256 headphone no sound
    
    [ Upstream commit 9b82ff1362f50914c8292902e07be98a9f59d33d ]
    
    Dell platform, plug headphone or headset, it had a chance to get no
    sound from headphone.
    Replace depop procedure will solve this issue.
    
    Signed-off-by: Kailang Yang <kailang@realtek.com>
    Link: https://lore.kernel.org/bb8e2de30d294dc287944efa0667685a@realtek.com
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ALSA: hda/realtek - FIxed ALC285 headphone no sound [+ + +]
Author: Kailang Yang <kailang@realtek.com>
Date:   Thu Aug 22 16:46:56 2024 +0800

    ALSA: hda/realtek - FIxed ALC285 headphone no sound
    
    [ Upstream commit 1fa7b099d60ad64f559bd3b8e3f0d94b2e015514 ]
    
    Dell platform with ALC215 ALC285 ALC289 ALC225 ALC295 ALC299, plug
    headphone or headset.
    It had a chance to get no sound from headphone.
    Replace depop procedure will solve this issue.
    
    Signed-off-by: Kailang Yang <kailang@realtek.com>
    Link: https://lore.kernel.org/d0de1b03fd174520945dde216d765223@realtek.com
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
ASoC: allow module autoloading for table board_ids [+ + +]
Author: Hongbo Li <lihongbo22@huawei.com>
Date:   Wed Aug 21 14:19:55 2024 +0800

    ASoC: allow module autoloading for table board_ids
    
    [ Upstream commit 5f7c98b7519a3a847d9182bd99d57ea250032ca1 ]
    
    Add MODULE_DEVICE_TABLE(), so modules could be properly
    autoloaded based on the alias from platform_device_id table.
    
    Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
    Link: https://patch.msgid.link/20240821061955.2273782-3-lihongbo22@huawei.com
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ASoC: allow module autoloading for table db1200_pids [+ + +]
Author: Hongbo Li <lihongbo22@huawei.com>
Date:   Wed Aug 21 14:19:54 2024 +0800

    ASoC: allow module autoloading for table db1200_pids
    
    [ Upstream commit 0e9fdab1e8df490354562187cdbb8dec643eae2c ]
    
    Add MODULE_DEVICE_TABLE(), so modules could be properly
    autoloaded based on the alias from platform_device_id table.
    
    Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
    Link: https://patch.msgid.link/20240821061955.2273782-2-lihongbo22@huawei.com
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ASoC: intel: fix module autoloading [+ + +]
Author: Liao Chen <liaochen4@huawei.com>
Date:   Mon Aug 26 08:49:21 2024 +0000

    ASoC: intel: fix module autoloading
    
    [ Upstream commit ae61a3391088d29aa8605c9f2db84295ab993a49 ]
    
    Add MODULE_DEVICE_TABLE(), so modules could be properly autoloaded
    based on the alias from of_device_id table.
    
    Signed-off-by: Liao Chen <liaochen4@huawei.com>
    Link: https://patch.msgid.link/20240826084924.368387-2-liaochen4@huawei.com
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ASoC: Intel: soc-acpi-cht: Make Lenovo Yoga Tab 3 X90F DMI match less strict [+ + +]
Author: Hans de Goede <hdegoede@redhat.com>
Date:   Fri Aug 23 09:43:05 2024 +0200

    ASoC: Intel: soc-acpi-cht: Make Lenovo Yoga Tab 3 X90F DMI match less strict
    
    [ Upstream commit 839a4ec06f75cec8fec2cc5fc14e921d0c3f7369 ]
    
    There are 2G and 4G RAM versions of the Lenovo Yoga Tab 3 X90F and it
    turns out that the 2G version has a DMI product name of
    "CHERRYVIEW D1 PLATFORM" where as the 4G version has
    "CHERRYVIEW C0 PLATFORM". The sys-vendor + product-version check are
    unique enough that the product-name check is not necessary.
    
    Drop the product-name check so that the existing DMI match for the 4G
    RAM version also matches the 2G RAM version.
    
    Signed-off-by: Hans de Goede <hdegoede@redhat.com>
    Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
    Link: https://patch.msgid.link/20240823074305.16873-1-hdegoede@redhat.com
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ASoC: SOF: mediatek: Add missing board compatible [+ + +]
Author: Albert Jakieła <jakiela@google.com>
Date:   Fri Aug 9 13:56:27 2024 +0000

    ASoC: SOF: mediatek: Add missing board compatible
    
    [ Upstream commit c0196faaa927321a63e680427e075734ee656e42 ]
    
    Add Google Dojo compatible.
    
    Signed-off-by: Albert Jakieła <jakiela@google.com>
    Reviewed-by: Chen-Yu Tsai <wenst@chromium.org>
    Link: https://patch.msgid.link/20240809135627.544429-1-jakiela@google.com
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>
ASoC: tda7419: fix module autoloading [+ + +]
Author: Liao Chen <liaochen4@huawei.com>
Date:   Mon Aug 26 08:49:23 2024 +0000

    ASoC: tda7419: fix module autoloading
    
    [ Upstream commit 934b44589da9aa300201a00fe139c5c54f421563 ]
    
    Add MODULE_DEVICE_TABLE(), so modules could be properly autoloaded
    based on the alias from of_device_id table.
    
    Signed-off-by: Liao Chen <liaochen4@huawei.com>
    Link: https://patch.msgid.link/20240826084924.368387-4-liaochen4@huawei.com
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
block: Fix where bio IO priority gets set [+ + +]
Author: Hongyu Jin <hongyu.jin@unisoc.com>
Date:   Tue Jan 30 15:26:34 2024 -0500

    block: Fix where bio IO priority gets set
    
    [ Upstream commit f3c89983cb4fc00be64eb0d5cbcfcdf2cacb965e ]
    
    Commit 82b74cac2849 ("blk-ioprio: Convert from rqos policy to direct
    call") pushed setting bio I/O priority down into blk_mq_submit_bio()
    -- which is too low within block core's submit_bio() because it
    skips setting I/O priority for block drivers that implement
    fops->submit_bio() (e.g. DM, MD, etc).
    
    Fix this by moving bio_set_ioprio() up from blk-mq.c to blk-core.c and
    call it from submit_bio().  This ensures all block drivers call
    bio_set_ioprio() during initial bio submission.
    
    Fixes: a78418e6a04c ("block: Always initialize bio IO priority on submit")
    Co-developed-by: Yibin Ding <yibin.ding@unisoc.com>
    Signed-off-by: Yibin Ding <yibin.ding@unisoc.com>
    Signed-off-by: Hongyu Jin <hongyu.jin@unisoc.com>
    Reviewed-by: Eric Biggers <ebiggers@google.com>
    Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
    [snitzer: revised commit header]
    Signed-off-by: Mike Snitzer <snitzer@kernel.org>
    Reviewed-by: Ming Lei <ming.lei@redhat.com>
    Link: https://lore.kernel.org/r/20240130202638.62600-2-snitzer@kernel.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
btrfs: calculate the right space for delayed refs when updating global reserve [+ + +]
Author: Filipe Manana <fdmanana@suse.com>
Date:   Tue Mar 21 11:13:59 2023 +0000

    btrfs: calculate the right space for delayed refs when updating global reserve
    
    commit f8f210dc84709804c9f952297f2bfafa6ea6b4bd upstream.
    
    When updating the global block reserve, we account for the 6 items needed
    by an unlink operation and the 6 delayed references for each one of those
    items. However the calculation for the delayed references is not correct
    in case we have the free space tree enabled, as in that case we need to
    touch the free space tree as well and therefore need twice the number of
    bytes. So use the btrfs_calc_delayed_ref_bytes() helper to calculate the
    number of bytes need for the delayed references at
    btrfs_update_global_block_rsv().
    
    Reviewed-by: Josef Bacik <josef@toxicpanda.com>
    Signed-off-by: Filipe Manana <fdmanana@suse.com>
    Reviewed-by: David Sterba <dsterba@suse.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    [Diogo: this patch has been cherry-picked from the original commit;
    conflicts included lack of a define (picked from commit 5630e2bcfe223)
    and lack of btrfs_calc_delayed_ref_bytes (picked from commit 0e55a54502b97)
    - changed const struct -> struct for compatibility.]
    Signed-off-by: Diogo Jahchan Koike <djahchankoike@gmail.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

 
can: mcp251xfd: mcp251xfd_ring_init(): check TX-coalescing configuration [+ + +]
Author: Marc Kleine-Budde <mkl@pengutronix.de>
Date:   Fri Jul 5 17:24:42 2024 +0200

    can: mcp251xfd: mcp251xfd_ring_init(): check TX-coalescing configuration
    
    [ Upstream commit ac2b81eb8b2d104033560daea886ee84531e3d0a ]
    
    When changing the interface from CAN-CC to CAN-FD mode the old
    coalescing parameters are re-used. This might cause problem, as the
    configured parameters are too big for CAN-FD mode.
    
    During testing an invalid TX coalescing configuration has been seen.
    The problem should be been fixed in the previous patch, but add a
    safeguard here to ensure that the number of TEF coalescing buffers (if
    configured) is exactly the half of all TEF buffers.
    
    Link: https://lore.kernel.org/all/20240805-mcp251xfd-fix-ringconfig-v1-2-72086f0ca5ee@pengutronix.de
    Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

can: mcp251xfd: move mcp251xfd_timestamp_start()/stop() into mcp251xfd_chip_start/stop() [+ + +]
Author: Marc Kleine-Budde <mkl@pengutronix.de>
Date:   Wed Jan 11 12:10:04 2023 +0100

    can: mcp251xfd: move mcp251xfd_timestamp_start()/stop() into mcp251xfd_chip_start/stop()
    
    commit a7801540f325d104de5065850a003f1d9bdc6ad3 upstream.
    
    The mcp251xfd wakes up from Low Power or Sleep Mode when SPI activity
    is detected. To avoid this, make sure that the timestamp worker is
    stopped before shutting down the chip.
    
    Split the starting of the timestamp worker out of
    mcp251xfd_timestamp_init() into the separate function
    mcp251xfd_timestamp_start().
    
    Call mcp251xfd_timestamp_init() before mcp251xfd_chip_start(), move
    mcp251xfd_timestamp_start() to mcp251xfd_chip_start(). In this way,
    mcp251xfd_timestamp_stop() can be called unconditionally by
    mcp251xfd_chip_stop().
    
    Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

can: mcp251xfd: properly indent labels [+ + +]
Author: Marc Kleine-Budde <mkl@pengutronix.de>
Date:   Thu Apr 25 10:14:45 2024 +0200

    can: mcp251xfd: properly indent labels
    
    commit 51b2a721612236335ddec4f3fb5f59e72a204f3a upstream.
    
    To fix the coding style, remove the whitespace in front of labels.
    
    Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

 
drm: komeda: Fix an issue related to normalized zpos [+ + +]
Author: hongchi.peng <hongchi.peng@siengine.com>
Date:   Mon Aug 26 10:45:17 2024 +0800

    drm: komeda: Fix an issue related to normalized zpos
    
    [ Upstream commit 258905cb9a6414be5c9ca4aa20ef855f8dc894d4 ]
    
    We use komeda_crtc_normalize_zpos to normalize zpos of affected planes
    to their blending zorder in CU. If there's only one slave plane in
    affected planes and its layer_split property is enabled, order++ for
    its split layer, so that when calculating the normalized_zpos
    of master planes, the split layer of the slave plane is included, but
    the max_slave_zorder does not include the split layer and keep zero
    because there's only one slave plane in affacted planes, although we
    actually use two slave layers in this commit.
    
    In most cases, this bug does not result in a commit failure, but assume
    the following situation:
        slave_layer 0: zpos = 0, layer split enabled, normalized_zpos =
        0;(use slave_layer 2 as its split layer)
        master_layer 0: zpos = 2, layer_split enabled, normalized_zpos =
        2;(use master_layer 2 as its split layer)
        master_layer 1: zpos = 4, normalized_zpos = 4;
        master_layer 3: zpos = 5, normalized_zpos = 5;
        kcrtc_st->max_slave_zorder = 0;
    When we use master_layer 3 as a input of CU in function
    komeda_compiz_set_input and check it with function
    komeda_component_check_input, the parameter idx is equal to
    normailzed_zpos minus max_slave_zorder, the value of idx is 5
    and is euqal to CU's max_active_inputs, so that
    komeda_component_check_input returns a -EINVAL value.
    
    To fix the bug described above, when calculating the max_slave_zorder
    with the layer_split enabled, count the split layer in this calculation
    directly.
    
    Signed-off-by: hongchi.peng <hongchi.peng@siengine.com>
    Acked-by: Liviu Dudau <liviu.dudau@arm.com>
    Signed-off-by: Liviu Dudau <liviu.dudau@arm.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20240826024517.3739-1-hongchi.peng@siengine.com
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
gpio: prevent potential speculation leaks in gpio_device_get_desc() [+ + +]
Author: Hagar Hemdan <hagarhem@amazon.com>
Date:   Thu May 23 08:53:32 2024 +0000

    gpio: prevent potential speculation leaks in gpio_device_get_desc()
    
    commit d795848ecce24a75dfd46481aee066ae6fe39775 upstream.
    
    Userspace may trigger a speculative read of an address outside the gpio
    descriptor array.
    Users can do that by calling gpio_ioctl() with an offset out of range.
    Offset is copied from user and then used as an array index to get
    the gpio descriptor without sanitization in gpio_device_get_desc().
    
    This change ensures that the offset is sanitized by using
    array_index_nospec() to mitigate any possibility of speculative
    information leaks.
    
    This bug was discovered and resolved using Coverity Static Analysis
    Security Testing (SAST) by Synopsys, Inc.
    
    Signed-off-by: Hagar Hemdan <hagarhem@amazon.com>
    Link: https://lore.kernel.org/r/20240523085332.1801-1-hagarhem@amazon.com
    Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
    Signed-off-by: Hugo SIMELIERE <hsimeliere.opensource@witekio.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

 
gpiolib: cdev: Ignore reconfiguration without direction [+ + +]
Author: Kent Gibson <warthog618@gmail.com>
Date:   Wed Jun 26 13:29:23 2024 +0800

    gpiolib: cdev: Ignore reconfiguration without direction
    
    commit b440396387418fe2feaacd41ca16080e7a8bc9ad upstream.
    
    linereq_set_config() behaves badly when direction is not set.
    The configuration validation is borrowed from linereq_create(), where,
    to verify the intent of the user, the direction must be set to in order to
    effect a change to the electrical configuration of a line. But, when
    applied to reconfiguration, that validation does not allow for the unset
    direction case, making it possible to clear flags set previously without
    specifying the line direction.
    
    Adding to the inconsistency, those changes are not immediately applied by
    linereq_set_config(), but will take effect when the line value is next get
    or set.
    
    For example, by requesting a configuration with no flags set, an output
    line with GPIO_V2_LINE_FLAG_ACTIVE_LOW and GPIO_V2_LINE_FLAG_OPEN_DRAIN
    set could have those flags cleared, inverting the sense of the line and
    changing the line drive to push-pull on the next line value set.
    
    Skip the reconfiguration of lines for which the direction is not set, and
    only reconfigure the lines for which direction is set.
    
    Fixes: a54756cb24ea ("gpiolib: cdev: support GPIO_V2_LINE_SET_CONFIG_IOCTL")
    Signed-off-by: Kent Gibson <warthog618@gmail.com>
    Link: https://lore.kernel.org/r/20240626052925.174272-3-warthog618@gmail.com
    Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

 
hwmon: (asus-ec-sensors) remove VRM temp X570-E GAMING [+ + +]
Author: Ross Brown <true.robot.ross@gmail.com>
Date:   Tue Jul 30 08:21:42 2024 +0200

    hwmon: (asus-ec-sensors) remove VRM temp X570-E GAMING
    
    [ Upstream commit 9efaebc0072b8e95505544bf385c20ee8a29d799 ]
    
    X570-E GAMING does not have VRM temperature sensor.
    
    Signed-off-by: Ross Brown <true.robot.ross@gmail.com>
    Signed-off-by: Eugene Shalygin <eugene.shalygin@gmail.com>
    Link: https://lore.kernel.org/r/20240730062320.5188-2-eugene.shalygin@gmail.com
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
Linux: Linux 6.1.112 [+ + +]
Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date:   Mon Sep 30 16:23:56 2024 +0200

    Linux 6.1.112
    
    Link: https://lore.kernel.org/r/20240927121719.897851549@linuxfoundation.org
    Tested-by: Peter Schneider <pschneider1968@googlemail.com>
    Tested-by: Allen Pais <apais@linux.microsoft.com>
    Tested-by: Jon Hunter <jonathanh@nvidia.com>
    Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
    Tested-by: Salvatore Bonaccorso <carnil@debian.org>
    Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
    Tested-by: Shuah Khan <skhan@linuxfoundation.org>
    Tested-by: Ron Economos <re@w6rz.net>
    Tested-by: kernelci.org bot <bot@kernelci.org>
    Tested-by: Pavel Machek (CIP) <pavel@denx.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

 
LoongArch: Define ARCH_IRQ_INIT_FLAGS as IRQ_NOPROBE [+ + +]
Author: Huacai Chen <chenhuacai@kernel.org>
Date:   Mon Aug 26 23:11:32 2024 +0800

    LoongArch: Define ARCH_IRQ_INIT_FLAGS as IRQ_NOPROBE
    
    [ Upstream commit 274ea3563e5ab9f468c15bfb9d2492803a66d9be ]
    
    Currently we call irq_set_noprobe() in a loop for all IRQs, but indeed
    it only works for IRQs below NR_IRQS_LEGACY because at init_IRQ() only
    legacy interrupts have been allocated.
    
    Instead, we can define ARCH_IRQ_INIT_FLAGS as IRQ_NOPROBE in asm/hwirq.h
    and the core will automatically set the flag for all interrupts.
    
    Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
    Signed-off-by: Tianyang Zhang <zhangtianyang@loongson.cn>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
microblaze: don't treat zero reserved memory regions as error [+ + +]
Author: Mike Rapoport <rppt@kernel.org>
Date:   Mon Jul 29 08:33:27 2024 +0300

    microblaze: don't treat zero reserved memory regions as error
    
    [ Upstream commit 0075df288dd8a7abfe03b3766176c393063591dd ]
    
    Before commit 721f4a6526da ("mm/memblock: remove empty dummy entry") the
    check for non-zero of memblock.reserved.cnt in mmu_init() would always
    be true either because  memblock.reserved.cnt is initialized to 1 or
    because there were memory reservations earlier.
    
    The removal of dummy empty entry in memblock caused this check to fail
    because now memblock.reserved.cnt is initialized to 0.
    
    Remove the check for non-zero of memblock.reserved.cnt because it's
    perfectly fine to have an empty memblock.reserved array that early in
    boot.
    
    Reported-by: Guenter Roeck <linux@roeck-us.net>
    Signed-off-by: Mike Rapoport <rppt@kernel.org>
    Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
    Tested-by: Guenter Roeck <linux@roeck-us.net>
    Link: https://lore.kernel.org/r/20240729053327.4091459-1-rppt@kernel.org
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
net: ftgmac100: Ensure tx descriptor updates are visible [+ + +]
Author: Jacky Chou <jacky_chou@aspeedtech.com>
Date:   Thu Aug 22 15:30:06 2024 +0800

    net: ftgmac100: Ensure tx descriptor updates are visible
    
    [ Upstream commit 4186c8d9e6af57bab0687b299df10ebd47534a0a ]
    
    The driver must ensure TX descriptor updates are visible
    before updating TX pointer and TX clear pointer.
    
    This resolves TX hangs observed on AST2600 when running
    iperf3.
    
    Signed-off-by: Jacky Chou <jacky_chou@aspeedtech.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
netfilter: nf_tables: missing iterator type in lookup walk [+ + +]
Author: Pablo Neira Ayuso <pablo@netfilter.org>
Date:   Tue Sep 17 22:25:04 2024 +0200

    netfilter: nf_tables: missing iterator type in lookup walk
    
    commit efefd4f00c967d00ad7abe092554ffbb70c1a793 upstream.
    
    Add missing decorator type to lookup expression and tighten WARN_ON_ONCE
    check in pipapo to spot earlier that this is unset.
    
    Fixes: 29b359cf6d95 ("netfilter: nft_set_pipapo: walk over current view on netlink dump")
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

netfilter: nft_set_pipapo: walk over current view on netlink dump [+ + +]
Author: Pablo Neira Ayuso <pablo@netfilter.org>
Date:   Tue Sep 17 22:25:03 2024 +0200

    netfilter: nft_set_pipapo: walk over current view on netlink dump
    
    commit 29b359cf6d95fd60730533f7f10464e95bd17c73 upstream.
    
    The generation mask can be updated while netlink dump is in progress.
    The pipapo set backend walk iterator cannot rely on it to infer what
    view of the datastructure is to be used. Add notation to specify if user
    wants to read/update the set.
    
    Based on patch from Florian Westphal.
    
    Fixes: 2b84e215f874 ("netfilter: nft_set_pipapo: .walk does not deal with generations")
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

netfilter: nft_socket: Fix a NULL vs IS_ERR() bug in nft_socket_cgroup_subtree_level() [+ + +]
Author: Dan Carpenter <dan.carpenter@linaro.org>
Date:   Sat Sep 14 12:56:51 2024 +0300

    netfilter: nft_socket: Fix a NULL vs IS_ERR() bug in nft_socket_cgroup_subtree_level()
    
    commit 7052622fccb1efb850c6b55de477f65d03525a30 upstream.
    
    The cgroup_get_from_path() function never returns NULL, it returns error
    pointers.  Update the error handling to match.
    
    Fixes: 7f3287db6543 ("netfilter: nft_socket: make cgroupsv2 matching work with namespaces")
    Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
    Acked-by: Florian Westphal <fw@strlen.de>
    Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
    Link: https://patch.msgid.link/bbc0c4e0-05cc-4f44-8797-2f4b3920a820@stanley.mountain
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

netfilter: nft_socket: make cgroupsv2 matching work with namespaces [+ + +]
Author: Florian Westphal <fw@strlen.de>
Date:   Sat Sep 7 16:07:49 2024 +0200

    netfilter: nft_socket: make cgroupsv2 matching work with namespaces
    
    commit 7f3287db654395f9c5ddd246325ff7889f550286 upstream.
    
    When running in container environmment, /sys/fs/cgroup/ might not be
    the real root node of the sk-attached cgroup.
    
    Example:
    
    In container:
    % stat /sys//fs/cgroup/
    Device: 0,21    Inode: 2214  ..
    % stat /sys/fs/cgroup/foo
    Device: 0,21    Inode: 2264  ..
    
    The expectation would be for:
    
      nft add rule .. socket cgroupv2 level 1 "foo" counter
    
    to match traffic from a process that got added to "foo" via
    "echo $pid > /sys/fs/cgroup/foo/cgroup.procs".
    
    However, 'level 3' is needed to make this work.
    
    Seen from initial namespace, the complete hierarchy is:
    
    % stat /sys/fs/cgroup/system.slice/docker-.../foo
      Device: 0,21    Inode: 2264 ..
    
    i.e. hierarchy is
    0    1               2              3
    / -> system.slice -> docker-1... -> foo
    
    ... but the container doesn't know that its "/" is the "docker-1.."
    cgroup.  Current code will retrieve the 'system.slice' cgroup node
    and store its kn->id in the destination register, so compare with
    2264 ("foo" cgroup id) will not match.
    
    Fetch "/" cgroup from ->init() and add its level to the level we try to
    extract.  cgroup root-level is 0 for the init-namespace or the level
    of the ancestor that is exposed as the cgroup root inside the container.
    
    In the above case, cgrp->level of "/" resolved in the container is 2
    (docker-1...scope/) and request for 'level 1' will get adjusted
    to fetch the actual level (3).
    
    v2: use CONFIG_SOCK_CGROUP_DATA, eval function depends on it.
        (kernel test robot)
    
    Cc: cgroups@vger.kernel.org
    Fixes: e0bb96db96f8 ("netfilter: nft_socket: add support for cgroupsv2")
    Reported-by: Nadia Pinaeva <n.m.pinaeva@gmail.com>
    Signed-off-by: Florian Westphal <fw@strlen.de>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

 
ocfs2: add bounds checking to ocfs2_xattr_find_entry() [+ + +]
Author: Ferry Meng <mengferry@linux.alibaba.com>
Date:   Mon May 20 10:40:23 2024 +0800

    ocfs2: add bounds checking to ocfs2_xattr_find_entry()
    
    [ Upstream commit 9e3041fecdc8f78a5900c3aa51d3d756e73264d6 ]
    
    Add a paranoia check to make sure it doesn't stray beyond valid memory
    region containing ocfs2 xattr entries when scanning for a match.  It will
    prevent out-of-bound access in case of crafted images.
    
    Link: https://lkml.kernel.org/r/20240520024024.1976129-1-joseph.qi@linux.alibaba.com
    Signed-off-by: Ferry Meng <mengferry@linux.alibaba.com>
    Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
    Reported-by: lei lu <llfamsec@gmail.com>
    Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
    Cc: Mark Fasheh <mark@fasheh.com>
    Cc: Joel Becker <jlbec@evilplan.org>
    Cc: Junxiao Bi <junxiao.bi@oracle.com>
    Cc: Changwei Ge <gechangwei@live.cn>
    Cc: Gang He <ghe@suse.com>
    Cc: Jun Piao <piaojun@huawei.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Stable-dep-of: af77c4fc1871 ("ocfs2: strict bound check before memcmp in ocfs2_xattr_find_entry()")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ocfs2: strict bound check before memcmp in ocfs2_xattr_find_entry() [+ + +]
Author: Ferry Meng <mengferry@linux.alibaba.com>
Date:   Mon May 20 10:40:24 2024 +0800

    ocfs2: strict bound check before memcmp in ocfs2_xattr_find_entry()
    
    [ Upstream commit af77c4fc1871847b528d58b7fdafb4aa1f6a9262 ]
    
    xattr in ocfs2 maybe 'non-indexed', which saved with additional space
    requested.  It's better to check if the memory is out of bound before
    memcmp, although this possibility mainly comes from crafted poisonous
    images.
    
    Link: https://lkml.kernel.org/r/20240520024024.1976129-2-joseph.qi@linux.alibaba.com
    Signed-off-by: Ferry Meng <mengferry@linux.alibaba.com>
    Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
    Reported-by: lei lu <llfamsec@gmail.com>
    Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
    Cc: Changwei Ge <gechangwei@live.cn>
    Cc: Gang He <ghe@suse.com>
    Cc: Joel Becker <jlbec@evilplan.org>
    Cc: Jun Piao <piaojun@huawei.com>
    Cc: Junxiao Bi <junxiao.bi@oracle.com>
    Cc: Mark Fasheh <mark@fasheh.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
pinctrl: at91: make it work with current gpiolib [+ + +]
Author: Thomas Blocher <thomas.blocher@ek-dev.de>
Date:   Wed Jul 31 01:16:26 2024 +0200

    pinctrl: at91: make it work with current gpiolib
    
    [ Upstream commit 752f387faaae0ae2e84d3f496922524785e77d60 ]
    
    pinctrl-at91 currently does not support the gpio-groups devicetree
    property and has no pin-range.
    Because of this at91 gpios stopped working since patch
    commit 2ab73c6d8323fa1e ("gpio: Support GPIO controllers without pin-ranges")
    This was discussed in the patches
    commit fc328a7d1fcce263 ("gpio: Revert regression in sysfs-gpio (gpiolib.c)")
    commit 56e337f2cf132632 ("Revert "gpio: Revert regression in sysfs-gpio (gpiolib.c)"")
    
    As a workaround manually set pin-range via gpiochip_add_pin_range() until
    a) pinctrl-at91 is reworked to support devicetree gpio-groups
    b) another solution as mentioned in
    commit 56e337f2cf132632 ("Revert "gpio: Revert regression in sysfs-gpio (gpiolib.c)"")
    is found
    
    Signed-off-by: Thomas Blocher <thomas.blocher@ek-dev.de>
    Link: https://lore.kernel.org/5b992862-355d-f0de-cd3d-ff99e67a4ff1@ek-dev.de
    Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
powercap: RAPL: fix invalid initialization for pl4_supported field [+ + +]
Author: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
Date:   Thu Jun 8 08:00:06 2023 +0530

    powercap: RAPL: fix invalid initialization for pl4_supported field
    
    commit d05b5e0baf424c8c4b4709ac11f66ab726c8deaf upstream.
    
    The current initialization of the struct x86_cpu_id via
    pl4_support_ids[] is partial and wrong. It is initializing
    "stepping" field with "X86_FEATURE_ANY" instead of "feature" field.
    
    Use X86_MATCH_INTEL_FAM6_MODEL macro instead of initializing
    each field of the struct x86_cpu_id for pl4_supported list of CPUs.
    This X86_MATCH_INTEL_FAM6_MODEL macro internally uses another macro
    X86_MATCH_VENDOR_FAM_MODEL_FEATURE for X86 based CPU matching with
    appropriate initialized values.
    
    Reported-by: Dave Hansen <dave.hansen@intel.com>
    Link: https://lore.kernel.org/lkml/28ead36b-2d9e-1a36-6f4e-04684e420260@intel.com
    Fixes: eb52bc2ae5b8 ("powercap: RAPL: Add Power Limit4 support for Meteor Lake SoC")
    Fixes: b08b95cf30f5 ("powercap: RAPL: Add Power Limit4 support for Alder Lake-N and Raptor Lake-P")
    Fixes: 515755906921 ("powercap: RAPL: Add Power Limit4 support for RaptorLake")
    Fixes: 1cc5b9a411e4 ("powercap: Add Power Limit4 support for Alder Lake SoC")
    Fixes: 8365a898fe53 ("powercap: Add Power Limit4 support")
    Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
    [ Ricardo: I removed METEORLAKE and METEORLAKE_L from pl4_support_ids as
      they are not included in v6.1. ]
    Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

 
Revert "wifi: cfg80211: check wiphy mutex is held for wdev mutex" [+ + +]
Author: Ping-Ke Shih <pkshih@realtek.com>
Date:   Thu Sep 26 08:30:17 2024 +0800

    Revert "wifi: cfg80211: check wiphy mutex is held for wdev mutex"
    
    This reverts commit 19d13ec00a8b1d60c5cc06bd0006b91d5bd8d46f which is
    commmit 1474bc87fe57deac726cc10203f73daa6c3212f7 upstream.
    
    The reverted commit is based on implementation of wiphy locking that isn't
    planned to redo on a stable kernel, so revert it to avoid warning:
    
     WARNING: CPU: 0 PID: 9 at net/wireless/core.h:231 disconnect_work+0xb8/0x144 [cfg80211]
     CPU: 0 PID: 9 Comm: kworker/0:1 Not tainted 6.6.51-00141-ga1649b6f8ed6 #7
     Hardware name: Freescale i.MX6 SoloX (Device Tree)
     Workqueue: events disconnect_work [cfg80211]
      unwind_backtrace from show_stack+0x10/0x14
      show_stack from dump_stack_lvl+0x58/0x70
      dump_stack_lvl from __warn+0x70/0x1c0
      __warn from warn_slowpath_fmt+0x16c/0x294
      warn_slowpath_fmt from disconnect_work+0xb8/0x144 [cfg80211]
      disconnect_work [cfg80211] from process_one_work+0x204/0x620
      process_one_work from worker_thread+0x1b0/0x474
      worker_thread from kthread+0x10c/0x12c
      kthread from ret_from_fork+0x14/0x24
    
    Reported-by: petter@technux.se
    Closes: https://lore.kernel.org/linux-wireless/9e98937d781c990615ef27ee0c858ff9@technux.se/T/#t
    Cc: Johannes Berg <johannes@sipsolutions.net>
    Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

 
scsi: lpfc: Fix overflow build issue [+ + +]
Author: Sherry Yang <sherry.yang@oracle.com>
Date:   Tue Aug 20 23:51:31 2024 -0700

    scsi: lpfc: Fix overflow build issue
    
    [ Upstream commit 3417c9574e368f0330637505f00d3814ca8854d2 ]
    
    Build failed while enabling "CONFIG_GCOV_KERNEL=y" and
    "CONFIG_GCOV_PROFILE_ALL=y" with following error:
    
    BUILDSTDERR: drivers/scsi/lpfc/lpfc_bsg.c: In function 'lpfc_get_cgnbuf_info':
    BUILDSTDERR: ./include/linux/fortify-string.h:114:33: error: '__builtin_memcpy' accessing 18446744073709551615 bytes at offsets 0 and 0 overlaps 9223372036854775807 bytes at offset -9223372036854775808 [-Werror=restrict]
    BUILDSTDERR:   114 | #define __underlying_memcpy     __builtin_memcpy
    BUILDSTDERR:       |                                 ^
    BUILDSTDERR: ./include/linux/fortify-string.h:637:9: note: in expansion of macro '__underlying_memcpy'
    BUILDSTDERR:   637 |         __underlying_##op(p, q, __fortify_size);                        \
    BUILDSTDERR:       |         ^~~~~~~~~~~~~
    BUILDSTDERR: ./include/linux/fortify-string.h:682:26: note: in expansion of macro '__fortify_memcpy_chk'
    BUILDSTDERR:   682 | #define memcpy(p, q, s)  __fortify_memcpy_chk(p, q, s,                  \
    BUILDSTDERR:       |                          ^~~~~~~~~~~~~~~~~~~~
    BUILDSTDERR: drivers/scsi/lpfc/lpfc_bsg.c:5468:9: note: in expansion of macro 'memcpy'
    BUILDSTDERR:  5468 |         memcpy(cgn_buff, cp, cinfosz);
    BUILDSTDERR:       |         ^~~~~~
    
    This happens from the commit 06bb7fc0feee ("kbuild: turn on -Wrestrict by
    default"). Address this issue by using size_t type.
    
    Signed-off-by: Sherry Yang <sherry.yang@oracle.com>
    Link: https://lore.kernel.org/r/20240821065131.1180791-1-sherry.yang@oracle.com
    Reviewed-by: Justin Tee <justin.tee@broadcom.com>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
selftests: mptcp: join: restrict fullmesh endp on 1st sf [+ + +]
Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date:   Tue Sep 10 21:06:36 2024 +0200

    selftests: mptcp: join: restrict fullmesh endp on 1st sf
    
    commit 49ac6f05ace5bb0070c68a0193aa05d3c25d4c83 upstream.
    
    A new endpoint using the IP of the initial subflow has been recently
    added to increase the code coverage. But it breaks the test when using
    old kernels not having commit 86e39e04482b ("mptcp: keep track of local
    endpoint still available for each msk"), e.g. on v5.15.
    
    Similar to commit d4c81bbb8600 ("selftests: mptcp: join: support local
    endpoint being tracked or not"), it is possible to add the new endpoint
    conditionally, by checking if "mptcp_pm_subflow_check_next" is present
    in kallsyms: this is not directly linked to the commit introducing this
    symbol but for the parent one which is linked anyway. So we can know in
    advance what will be the expected behaviour, and add the new endpoint
    only when it makes sense to do so.
    
    Fixes: 4878f9f8421f ("selftests: mptcp: join: validate fullmesh endp on 1st sf")
    Cc: stable@vger.kernel.org
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Link: https://patch.msgid.link/20240910-net-selftests-mptcp-fix-install-v1-1-8f124aa9156d@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    [ Conflicts in mptcp_join.sh, because the 'run_tests' helper has been
      modified in multiple commits that are not in this version, e.g. commit
      e571fb09c893 ("selftests: mptcp: add speed env var"). The conflict was
      in the context, the new lines can still be added at the same place. ]
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

 
smb: client: fix hang in wait_for_response() for negproto [+ + +]
Author: Paulo Alcantara <pc@manguebit.com>
Date:   Sat Aug 31 21:40:28 2024 -0300

    smb: client: fix hang in wait_for_response() for negproto
    
    [ Upstream commit 7ccc1465465d78e6411b7bd730d06e7435802b5c ]
    
    Call cifs_reconnect() to wake up processes waiting on negotiate
    protocol to handle the case where server abruptly shut down and had no
    chance to properly close the socket.
    
    Simple reproducer:
    
      ssh 192.168.2.100 pkill -STOP smbd
      mount.cifs //192.168.2.100/test /mnt -o ... [never returns]
    
    Cc: Rickard Andersson <rickaran@axis.com>
    Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
    Signed-off-by: Steve French <stfrench@microsoft.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
spi: bcm63xx: Enable module autoloading [+ + +]
Author: Liao Chen <liaochen4@huawei.com>
Date:   Sat Aug 31 09:42:31 2024 +0000

    spi: bcm63xx: Enable module autoloading
    
    [ Upstream commit 709df70a20e990d262c473ad9899314039e8ec82 ]
    
    Add MODULE_DEVICE_TABLE(), so modules could be properly autoloaded based
    on the alias from of_device_id table.
    
    Signed-off-by: Liao Chen <liaochen4@huawei.com>
    Link: https://patch.msgid.link/20240831094231.795024-1-liaochen4@huawei.com
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

spi: spidev: Add an entry for elgin,jg10309-01 [+ + +]
Author: Fabio Estevam <festevam@gmail.com>
Date:   Wed Aug 28 15:00:56 2024 -0300

    spi: spidev: Add an entry for elgin,jg10309-01
    
    [ Upstream commit 5f3eee1eef5d0edd23d8ac0974f56283649a1512 ]
    
    The rv1108-elgin-r1 board has an LCD controlled via SPI in userspace.
    The marking on the LCD is JG10309-01.
    
    Add the "elgin,jg10309-01" compatible string.
    
    Signed-off-by: Fabio Estevam <festevam@gmail.com>
    Reviewed-by: Heiko Stuebner <heiko@sntech.de>
    Link: https://patch.msgid.link/20240828180057.3167190-2-festevam@gmail.com
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

spi: spidev: Add missing spi_device_id for jg10309-01 [+ + +]
Author: Geert Uytterhoeven <geert+renesas@glider.be>
Date:   Tue Sep 3 14:32:27 2024 +0200

    spi: spidev: Add missing spi_device_id for jg10309-01
    
    [ Upstream commit 5478a4f7b94414def7b56d2f18bc2ed9b0f3f1f2 ]
    
    When the of_device_id entry for "elgin,jg10309-01" was added, the
    corresponding spi_device_id was forgotten, causing a warning message
    during boot-up:
    
        SPI driver spidev has no spi_device_id for elgin,jg10309-01
    
    Fix module autoloading and shut up the warning by adding the missing
    entry.
    
    Fixes: 5f3eee1eef5d0edd ("spi: spidev: Add an entry for elgin,jg10309-01")
    Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
    Link: https://patch.msgid.link/54bbb9d8a8db7e52d13e266f2d4a9bcd8b42a98a.1725366625.git.geert+renesas@glider.be
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
tools: hv: rm .*.cmd when make clean [+ + +]
Author: zhang jiao <zhangjiao2@cmss.chinamobile.com>
Date:   Mon Sep 2 12:21:03 2024 +0800

    tools: hv: rm .*.cmd when make clean
    
    [ Upstream commit 5e5cc1eb65256e6017e3deec04f9806f2f317853 ]
    
    rm .*.cmd when make clean
    
    Signed-off-by: zhang jiao <zhangjiao2@cmss.chinamobile.com>
    Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com>
    Link: https://lore.kernel.org/r/20240902042103.5867-1-zhangjiao2@cmss.chinamobile.com
    Signed-off-by: Wei Liu <wei.liu@kernel.org>
    Message-ID: <20240902042103.5867-1-zhangjiao2@cmss.chinamobile.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
USB: serial: pl2303: add device id for Macrosilicon MS3020 [+ + +]
Author: Junhao Xie <bigfoot@classfun.cn>
Date:   Tue Sep 3 23:06:38 2024 +0800

    USB: serial: pl2303: add device id for Macrosilicon MS3020
    
    commit 7d47d22444bb7dc1b6d768904a22070ef35e1fc0 upstream.
    
    Add the device id for the Macrosilicon MS3020 which is a
    PL2303HXN based device.
    
    Signed-off-by: Junhao Xie <bigfoot@classfun.cn>
    Cc: stable@vger.kernel.org
    Signed-off-by: Johan Hovold <johan@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

USB: usbtmc: prevent kernel-usb-infoleak [+ + +]
Author: Edward Adam Davis <eadavis@qq.com>
Date:   Sun Sep 8 17:17:41 2024 +0800

    USB: usbtmc: prevent kernel-usb-infoleak
    
    commit 625fa77151f00c1bd00d34d60d6f2e710b3f9aad upstream.
    
    The syzbot reported a kernel-usb-infoleak in usbtmc_write,
    we need to clear the structure before filling fields.
    
    Fixes: 4ddc645f40e9 ("usb: usbtmc: Add ioctl for vendor specific write")
    Reported-and-tested-by: syzbot+9d34f80f841e948c3fdb@syzkaller.appspotmail.com
    Closes: https://syzkaller.appspot.com/bug?extid=9d34f80f841e948c3fdb
    Signed-off-by: Edward Adam Davis <eadavis@qq.com>
    Cc: stable <stable@kernel.org>
    Link: https://lore.kernel.org/r/tencent_9649AA6EC56EDECCA8A7D106C792D1C66B06@qq.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

 
wifi: iwlwifi: clear trans->state earlier upon error [+ + +]
Author: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Date:   Sun Aug 25 19:17:01 2024 +0300

    wifi: iwlwifi: clear trans->state earlier upon error
    
    [ Upstream commit 094513f8a2fbddee51b055d8035f995551f98fce ]
    
    When the firmware crashes, we first told the op_mode and only then,
    changed the transport's state. This is a problem if the op_mode's
    nic_error() handler needs to send a host command: it'll see that the
    transport's state still reflects that the firmware is alive.
    
    Today, this has no consequences since we set the STATUS_FW_ERROR bit and
    that will prevent sending host commands. iwl_fw_dbg_stop_restart_recording
    looks at this bit to know not to send a host command for example.
    
    To fix the hibernation, we needed to reset the firmware without having
    an error and checking STATUS_FW_ERROR to see whether the firmware is
    alive will no longer hold, so this change is necessary as well.
    
    Change the flow a bit.
    Change trans->state before calling the op_mode's nic_error() method and
    check trans->state instead of STATUS_FW_ERROR. This will keep the
    current behavior of iwl_fw_dbg_stop_restart_recording upon firmware
    error, and it'll allow us to call iwl_fw_dbg_stop_restart_recording
    safely even if STATUS_FW_ERROR is clear, but yet, the firmware is not
    alive.
    
    Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
    Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
    Link: https://patch.msgid.link/20240825191257.9d7427fbdfd7.Ia056ca57029a382c921d6f7b6a6b28fc480f2f22@changeid
    [I missed this was a dependency for the hibernation fix, changed
     the commit message a bit accordingly]
    Signed-off-by: Johannes Berg <johannes.berg@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

wifi: iwlwifi: lower message level for FW buffer destination [+ + +]
Author: Benjamin Berg <benjamin.berg@intel.com>
Date:   Sun Aug 25 19:17:13 2024 +0300

    wifi: iwlwifi: lower message level for FW buffer destination
    
    [ Upstream commit f8a129c1e10256c785164ed5efa5d17d45fbd81b ]
    
    An invalid buffer destination is not a problem for the driver and it
    does not make sense to report it with the KERN_ERR message level. As
    such, change the message to use IWL_DEBUG_FW.
    
    Reported-by: Len Brown <lenb@kernel.org>
    Closes: https://lore.kernel.org/r/CAJvTdKkcxJss=DM2sxgv_MR5BeZ4_OC-3ad6tA40TYH2yqHCWw@mail.gmail.com
    Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
    Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
    Link: https://patch.msgid.link/20240825191257.20abf78f05bc.Ifbcecc2ae9fb40b9698302507dcba8b922c8d856@changeid
    Signed-off-by: Johannes Berg <johannes.berg@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

wifi: iwlwifi: mvm: don't wait for tx queues if firmware is dead [+ + +]
Author: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Date:   Sun Aug 25 19:17:04 2024 +0300

    wifi: iwlwifi: mvm: don't wait for tx queues if firmware is dead
    
    [ Upstream commit 3a84454f5204718ca5b4ad2c1f0bf2031e2403d1 ]
    
    There is a WARNING in iwl_trans_wait_tx_queues_empty() (that was
    recently converted from just a message), that can be hit if we
    wait for TX queues to become empty after firmware died. Clearly,
    we can't expect anything from the firmware after it's declared dead.
    
    Don't call iwl_trans_wait_tx_queues_empty() in this case. While it could
    be a good idea to stop the flow earlier, the flush functions do some
    maintenance work that is not related to the firmware, so keep that part
    of the code running even when the firmware is not running.
    
    Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
    Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
    Link: https://patch.msgid.link/20240825191257.a7cbd794cee9.I44a739fbd4ffcc46b83844dd1c7b2eb0c7b270f6@changeid
    [edit commit message]
    Signed-off-by: Johannes Berg <johannes.berg@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

wifi: iwlwifi: mvm: fix iwl_mvm_scan_fits() calculation [+ + +]
Author: Daniel Gabay <daniel.gabay@intel.com>
Date:   Sun Aug 25 19:17:05 2024 +0300

    wifi: iwlwifi: mvm: fix iwl_mvm_scan_fits() calculation
    
    [ Upstream commit d44162280899c3fc2c6700e21e491e71c3c96e3d ]
    
    The calculation should consider also the 6GHz IE's len, fix that.
    In addition, in iwl_mvm_sched_scan_start() the scan_fits helper is
    called only in case non_psc_incldued is true, but it should be called
    regardless, fix that as well.
    
    Signed-off-by: Daniel Gabay <daniel.gabay@intel.com>
    Reviewed-by: Ilan Peer <ilan.peer@intel.com>
    Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
    Link: https://patch.msgid.link/20240825191257.7db825442fd2.I99f4d6587709de02072fd57957ec7472331c6b1d@changeid
    Signed-off-by: Johannes Berg <johannes.berg@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

wifi: iwlwifi: mvm: pause TCM when the firmware is stopped [+ + +]
Author: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Date:   Sun Aug 25 19:17:10 2024 +0300

    wifi: iwlwifi: mvm: pause TCM when the firmware is stopped
    
    [ Upstream commit 0668ebc8c2282ca1e7eb96092a347baefffb5fe7 ]
    
    Not doing so will make us send a host command to the transport while the
    firmware is not alive, which will trigger a WARNING.
    
    bad state = 0
    WARNING: CPU: 2 PID: 17434 at drivers/net/wireless/intel/iwlwifi/iwl-trans.c:115 iwl_trans_send_cmd+0x1cb/0x1e0 [iwlwifi]
    RIP: 0010:iwl_trans_send_cmd+0x1cb/0x1e0 [iwlwifi]
    Call Trace:
     <TASK>
     iwl_mvm_send_cmd+0x40/0xc0 [iwlmvm]
     iwl_mvm_config_scan+0x198/0x260 [iwlmvm]
     iwl_mvm_recalc_tcm+0x730/0x11d0 [iwlmvm]
     iwl_mvm_tcm_work+0x1d/0x30 [iwlmvm]
     process_one_work+0x29e/0x640
     worker_thread+0x2df/0x690
     ? rescuer_thread+0x540/0x540
     kthread+0x192/0x1e0
     ? set_kthread_struct+0x90/0x90
     ret_from_fork+0x22/0x30
    
    Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
    Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
    Link: https://patch.msgid.link/20240825191257.5abe71ca1b6b.I97a968cb8be1f24f94652d9b110ecbf6af73f89e@changeid
    Signed-off-by: Johannes Berg <johannes.berg@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

wifi: mac80211: free skb on error path in ieee80211_beacon_get_ap() [+ + +]
Author: Dmitry Antipov <dmantipov@yandex.ru>
Date:   Mon Aug 5 17:20:35 2024 +0300

    wifi: mac80211: free skb on error path in ieee80211_beacon_get_ap()
    
    [ Upstream commit 786c5be9ac29a39b6f37f1fdd2ea59d0fe35d525 ]
    
    In 'ieee80211_beacon_get_ap()', free allocated skb in case of error
    returned by 'ieee80211_beacon_protect()'. Compile tested only.
    
    Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
    Link: https://patch.msgid.link/20240805142035.227847-1-dmantipov@yandex.ru
    Signed-off-by: Johannes Berg <johannes.berg@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
x86/hyperv: Set X86_FEATURE_TSC_KNOWN_FREQ when Hyper-V provides frequency [+ + +]
Author: Michael Kelley <mhklinux@outlook.com>
Date:   Wed Jun 5 19:55:59 2024 -0700

    x86/hyperv: Set X86_FEATURE_TSC_KNOWN_FREQ when Hyper-V provides frequency
    
    [ Upstream commit 8fcc514809de41153b43ccbe1a0cdf7f72b78e7e ]
    
    A Linux guest on Hyper-V gets the TSC frequency from a synthetic MSR, if
    available. In this case, set X86_FEATURE_TSC_KNOWN_FREQ so that Linux
    doesn't unnecessarily do refined TSC calibration when setting up the TSC
    clocksource.
    
    With this change, a message such as this is no longer output during boot
    when the TSC is used as the clocksource:
    
    [    1.115141] tsc: Refined TSC clocksource calibration: 2918.408 MHz
    
    Furthermore, the guest and host will have exactly the same view of the
    TSC frequency, which is important for features such as the TSC deadline
    timer that are emulated by the Hyper-V host.
    
    Signed-off-by: Michael Kelley <mhklinux@outlook.com>
    Reviewed-by: Roman Kisel <romank@linux.microsoft.com>
    Link: https://lore.kernel.org/r/20240606025559.1631-1-mhklinux@outlook.com
    Signed-off-by: Wei Liu <wei.liu@kernel.org>
    Message-ID: <20240606025559.1631-1-mhklinux@outlook.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 
x86/mm: Switch to new Intel CPU model defines [+ + +]
Author: Tony Luck <tony.luck@intel.com>
Date:   Wed Apr 24 11:15:18 2024 -0700

    x86/mm: Switch to new Intel CPU model defines
    
    commit 2eda374e883ad297bd9fe575a16c1dc850346075 upstream.
    
    New CPU #defines encode vendor and family as well as model.
    
    [ dhansen: vertically align 0's in invlpg_miss_ids[] ]
    
    Signed-off-by: Tony Luck <tony.luck@intel.com>
    Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Link: https://lore.kernel.org/all/20240424181518.41946-1-tony.luck%40intel.com
    [ Ricardo: I used the old match macro X86_MATCH_INTEL_FAM6_MODEL()
      instead of X86_MATCH_VFM() as in the upstream commit.
      I also kept the ALDERLAKE_N name instead of ATOM_GRACEMONT. Both refer
      to the same CPU model. ]
    Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
    Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

 
xfs: block reservation too large for minleft allocation [+ + +]
Author: Dave Chinner <dchinner@redhat.com>
Date:   Tue Sep 24 11:38:32 2024 -0700

    xfs: block reservation too large for minleft allocation
    
    [ Upstream commit d5753847b216db0e553e8065aa825cfe497ad143 ]
    
    When we enter xfs_bmbt_alloc_block() without having first allocated
    a data extent (i.e. tp->t_firstblock == NULLFSBLOCK) because we
    are doing something like unwritten extent conversion, the transaction
    block reservation is used as the minleft value.
    
    This works for operations like unwritten extent conversion, but it
    assumes that the block reservation is only for a BMBT split. THis is
    not always true, and sometimes results in larger than necessary
    minleft values being set. We only actually need enough space for a
    btree split, something we already handle correctly in
    xfs_bmapi_write() via the xfs_bmapi_minleft() calculation.
    
    We should use xfs_bmapi_minleft() in xfs_bmbt_alloc_block() to
    calculate the number of blocks a BMBT split on this inode is going to
    require, not use the transaction block reservation that contains the
    maximum number of blocks this transaction may consume in it...
    
    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
    Acked-by: Chandan Babu R <chandanbabu@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: buffer pins need to hold a buffer reference [+ + +]
Author: Dave Chinner <dchinner@redhat.com>
Date:   Tue Sep 24 11:38:36 2024 -0700

    xfs: buffer pins need to hold a buffer reference
    
    [ Upstream commit 89a4bf0dc3857569a77061d3d5ea2ac85f7e13c6 ]
    
    When a buffer is unpinned by xfs_buf_item_unpin(), we need to access
    the buffer after we've dropped the buffer log item reference count.
    This opens a window where we can have two racing unpins for the
    buffer item (e.g. shutdown checkpoint context callback processing
    racing with journal IO iclog completion processing) and both attempt
    to access the buffer after dropping the BLI reference count.  If we
    are unlucky, the "BLI freed" context wins the race and frees the
    buffer before the "BLI still active" case checks the buffer pin
    count.
    
    This results in a use after free that can only be triggered
    in active filesystem shutdown situations.
    
    To fix this, we need to ensure that buffer existence extends beyond
    the BLI reference count checks and until the unpin processing is
    complete. This implies that a buffer pin operation must also take a
    buffer reference to ensure that the buffer cannot be freed until the
    buffer unpin processing is complete.
    
    Reported-by: yangerkun <yangerkun@huawei.com>
    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Dave Chinner <david@fromorbit.com>
    Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
    Acked-by: Chandan Babu R <chandanbabu@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: collect errors from inodegc for unlinked inode recovery [+ + +]
Author: Dave Chinner <dchinner@redhat.com>
Date:   Tue Sep 24 11:38:39 2024 -0700

    xfs: collect errors from inodegc for unlinked inode recovery
    
    [ Upstream commit d4d12c02bf5f768f1b423c7ae2909c5afdfe0d5f ]
    
    Unlinked list recovery requires errors removing the inode the from
    the unlinked list get fed back to the main recovery loop. Now that
    we offload the unlinking to the inodegc work, we don't get errors
    being fed back when we trip over a corruption that prevents the
    inode from being removed from the unlinked list.
    
    This means we never clear the corrupt unlinked list bucket,
    resulting in runtime operations eventually tripping over it and
    shutting down.
    
    Fix this by collecting inodegc worker errors and feed them
    back to the flush caller. This is largely best effort - the only
    context that really cares is log recovery, and it only flushes a
    single inode at a time so we don't need complex synchronised
    handling. Essentially the inodegc workers will capture the first
    error that occurs and the next flush will gather them and clear
    them. The flush itself will only report the first gathered error.
    
    In the cases where callers can return errors, propagate the
    collected inodegc flush error up the error handling chain.
    
    In the case of inode unlinked list recovery, there are several
    superfluous calls to flush queued unlinked inodes -
    xlog_recover_iunlink_bucket() guarantees that it has flushed the
    inodegc and collected errors before it returns. Hence nothing in the
    calling path needs to run a flush, even when an error is returned.
    
    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Dave Chinner <david@fromorbit.com>
    Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
    Acked-by: Chandan Babu R <chandanbabu@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: correct calculation for agend and blockcount [+ + +]
Author: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Date:   Tue Sep 24 11:38:46 2024 -0700

    xfs: correct calculation for agend and blockcount
    
    [ Upstream commit 3c90c01e49342b166e5c90ec2c85b220be15a20e ]
    
    The agend should be "start + length - 1", then, blockcount should be
    "end + 1 - start".  Correct 2 calculation mistakes.
    
    Also, rename "agend" to "range_agend" because it's not the end of the AG
    per se; it's the end of the dead region within an AG's agblock space.
    
    Fixes: 5cf32f63b0f4 ("xfs: fix the calculation for "end" and "length"")
    Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
    Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
    Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
    Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
    Acked-by: Chandan Babu R <chandanbabu@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: defered work could create precommits [+ + +]
Author: Dave Chinner <dchinner@redhat.com>
Date:   Tue Sep 24 11:38:37 2024 -0700

    xfs: defered work could create precommits
    
    [ Upstream commit cb042117488dbf0b3b38b05771639890fada9a52 ]
    
    To fix a AGI-AGF-inode cluster buffer deadlock, we need to move
    inode cluster buffer operations to the ->iop_precommit() method.
    However, this means that deferred operations can require precommits
    to be run on the final transaction that the deferred ops pass back
    to xfs_trans_commit() context. This will be exposed by attribute
    handling, in that the last changes to the inode in the attr set
    state machine "disappear" because the precommit operation is not run.
    
    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Dave Chinner <david@fromorbit.com>
    Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
    Acked-by: Chandan Babu R <chandanbabu@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: don't use BMBT btree split workers for IO completion [+ + +]
Author: Dave Chinner <dchinner@redhat.com>
Date:   Tue Sep 24 11:38:29 2024 -0700

    xfs: don't use BMBT btree split workers for IO completion
    
    [ Upstream commit c85007e2e3942da1f9361e4b5a9388ea3a8dcc5b ]
    
    When we split a BMBT due to record insertion, we offload it to a
    worker thread because we can be deep in the stack when we try to
    allocate a new block for the BMBT. Allocation can use several
    kilobytes of stack (full memory reclaim, swap and/or IO path can
    end up on the stack during allocation) and we can already be several
    kilobytes deep in the stack when we need to split the BMBT.
    
    A recent workload demonstrated a deadlock in this BMBT split
    offload. It requires several things to happen at once:
    
    1. two inodes need a BMBT split at the same time, one must be
    unwritten extent conversion from IO completion, the other must be
    from extent allocation.
    
    2. there must be a no available xfs_alloc_wq worker threads
    available in the worker pool.
    
    3. There must be sustained severe memory shortages such that new
    kworker threads cannot be allocated to the xfs_alloc_wq pool for
    both threads that need split work to be run
    
    4. The split work from the unwritten extent conversion must run
    first.
    
    5. when the BMBT block allocation runs from the split work, it must
    loop over all AGs and not be able to either trylock an AGF
    successfully, or each AGF is is able to lock has no space available
    for a single block allocation.
    
    6. The BMBT allocation must then attempt to lock the AGF that the
    second task queued to the rescuer thread already has locked before
    it finds an AGF it can allocate from.
    
    At this point, we have an ABBA deadlock between tasks queued on the
    xfs_alloc_wq rescuer thread and a locked AGF. i.e. The queued task
    holding the AGF lock can't be run by the rescuer thread until the
    task the rescuer thread is runing gets the AGF lock....
    
    This is a highly improbably series of events, but there it is.
    
    There's a couple of ways to fix this, but the easiest way to ensure
    that we only punt tasks with a locked AGF that holds enough space
    for the BMBT block allocations to the worker thread.
    
    This works for unwritten extent conversion in IO completion (which
    doesn't have a locked AGF and space reservations) because we have
    tight control over the IO completion stack. It is typically only 6
    functions deep when xfs_btree_split() is called because we've
    already offloaded the IO completion work to a worker thread and
    hence we don't need to worry about stack overruns here.
    
    The other place we can be called for a BMBT split without a
    preceeding allocation is __xfs_bunmapi() when punching out the
    center of an existing extent. We don't remove extents in the IO
    path, so these operations don't tend to be called with a lot of
    stack consumed. Hence we don't really need to ship the split off to
    a worker thread in these cases, either.
    
    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
    Acked-by: Chandan Babu R <chandanbabu@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: dquot shrinker doesn't check for XFS_DQFLAG_FREEING [+ + +]
Author: Dave Chinner <dchinner@redhat.com>
Date:   Tue Sep 24 11:38:26 2024 -0700

    xfs: dquot shrinker doesn't check for XFS_DQFLAG_FREEING
    
    [ Upstream commit 52f31ed228212ba572c44e15e818a3a5c74122c0 ]
    
    Resulting in a UAF if the shrinker races with some other dquot
    freeing mechanism that sets XFS_DQFLAG_FREEING before the dquot is
    removed from the LRU. This can occur if a dquot purge races with
    drop_caches.
    
    Reported-by: syzbot+912776840162c13db1a3@syzkaller.appspotmail.com
    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
    Acked-by: Chandan Babu R <chandanbabu@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: fix ag count overflow during growfs [+ + +]
Author: Long Li <leo.lilong@huaweicloud.com>
Date:   Tue Sep 24 11:38:40 2024 -0700

    xfs: fix ag count overflow during growfs
    
    [ Upstream commit c3b880acadc95d6e019eae5d669e072afda24f1b ]
    
    I found a corruption during growfs:
    
     XFS (loop0): Internal error agbno >= mp->m_sb.sb_agblocks at line 3661 of
       file fs/xfs/libxfs/xfs_alloc.c.  Caller __xfs_free_extent+0x28e/0x3c0
     CPU: 0 PID: 573 Comm: xfs_growfs Not tainted 6.3.0-rc7-next-20230420-00001-gda8c95746257
     Call Trace:
      <TASK>
      dump_stack_lvl+0x50/0x70
      xfs_corruption_error+0x134/0x150
      __xfs_free_extent+0x2c1/0x3c0
      xfs_ag_extend_space+0x291/0x3e0
      xfs_growfs_data+0xd72/0xe90
      xfs_file_ioctl+0x5f9/0x14a0
      __x64_sys_ioctl+0x13e/0x1c0
      do_syscall_64+0x39/0x80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
     XFS (loop0): Corruption detected. Unmount and run xfs_repair
     XFS (loop0): Internal error xfs_trans_cancel at line 1097 of file
       fs/xfs/xfs_trans.c.  Caller xfs_growfs_data+0x691/0xe90
     CPU: 0 PID: 573 Comm: xfs_growfs Not tainted 6.3.0-rc7-next-20230420-00001-gda8c95746257
     Call Trace:
      <TASK>
      dump_stack_lvl+0x50/0x70
      xfs_error_report+0x93/0xc0
      xfs_trans_cancel+0x2c0/0x350
      xfs_growfs_data+0x691/0xe90
      xfs_file_ioctl+0x5f9/0x14a0
      __x64_sys_ioctl+0x13e/0x1c0
      do_syscall_64+0x39/0x80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
     RIP: 0033:0x7f2d86706577
    
    The bug can be reproduced with the following sequence:
    
     # truncate -s  1073741824 xfs_test.img
     # mkfs.xfs -f -b size=1024 -d agcount=4 xfs_test.img
     # truncate -s 2305843009213693952  xfs_test.img
     # mount -o loop xfs_test.img /mnt/test
     # xfs_growfs -D  1125899907891200  /mnt/test
    
    The root cause is that during growfs, user space passed in a large value
    of newblcoks to xfs_growfs_data_private(), due to current sb_agblocks is
    too small, new AG count will exceed UINT_MAX. Because of AG number type
    is unsigned int and it would overflow, that caused nagcount much smaller
    than the actual value. During AG extent space, delta blocks in
    xfs_resizefs_init_new_ags() will much larger than the actual value due to
    incorrect nagcount, even exceed UINT_MAX. This will cause corruption and
    be detected in __xfs_free_extent. Fix it by growing the filesystem to up
    to the maximally allowed AGs and not return EINVAL when new AG count
    overflow.
    
    Signed-off-by: Long Li <leo.lilong@huawei.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
    Acked-by: Chandan Babu R <chandanbabu@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: fix AGF vs inode cluster buffer deadlock [+ + +]
Author: Dave Chinner <dchinner@redhat.com>
Date:   Tue Sep 24 11:38:38 2024 -0700

    xfs: fix AGF vs inode cluster buffer deadlock
    
    [ Upstream commit 82842fee6e5979ca7e2bf4d839ef890c22ffb7aa ]
    
    Lock order in XFS is AGI -> AGF, hence for operations involving
    inode unlinked list operations we always lock the AGI first. Inode
    unlinked list operations operate on the inode cluster buffer,
    so the lock order there is AGI -> inode cluster buffer.
    
    For O_TMPFILE operations, this now means the lock order set down in
    xfs_rename and xfs_link is AGI -> inode cluster buffer -> AGF as the
    unlinked ops are done before the directory modifications that may
    allocate space and lock the AGF.
    
    Unfortunately, we also now lock the inode cluster buffer when
    logging an inode so that we can attach the inode to the cluster
    buffer and pin it in memory. This creates a lock order of AGF ->
    inode cluster buffer in directory operations as we have to log the
    inode after we've allocated new space for it.
    
    This creates a lock inversion between the AGF and the inode cluster
    buffer. Because the inode cluster buffer is shared across multiple
    inodes, the inversion is not specific to individual inodes but can
    occur when inodes in the same cluster buffer are accessed in
    different orders.
    
    To fix this we need move all the inode log item cluster buffer
    interactions to the end of the current transaction. Unfortunately,
    xfs_trans_log_inode() calls are littered throughout the transactions
    with no thought to ordering against other items or locking. This
    makes it difficult to do anything that involves changing the call
    sites of xfs_trans_log_inode() to change locking orders.
    
    However, we do now have a mechanism that allows is to postpone dirty
    item processing to just before we commit the transaction: the
    ->iop_precommit method. This will be called after all the
    modifications are done and high level objects like AGI and AGF
    buffers have been locked and modified, thereby providing a mechanism
    that guarantees we don't lock the inode cluster buffer before those
    high level objects are locked.
    
    This change is largely moving the guts of xfs_trans_log_inode() to
    xfs_inode_item_precommit() and providing an extra flag context in
    the inode log item to track the dirty state of the inode in the
    current transaction. This also means we do a lot less repeated work
    in xfs_trans_log_inode() by only doing it once per transaction when
    all the work is done.
    
    Fixes: 298f7bec503f ("xfs: pin inode backing buffer to the inode log item")
    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Dave Chinner <david@fromorbit.com>
    Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
    Acked-by: Chandan Babu R <chandanbabu@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: fix BUG_ON in xfs_getbmap() [+ + +]
Author: Ye Bin <yebin10@huawei.com>
Date:   Tue Sep 24 11:38:35 2024 -0700

    xfs: fix BUG_ON in xfs_getbmap()
    
    [ Upstream commit 8ee81ed581ff35882b006a5205100db0b57bf070 ]
    
    There's issue as follows:
    XFS: Assertion failed: (bmv->bmv_iflags & BMV_IF_DELALLOC) != 0, file: fs/xfs/xfs_bmap_util.c, line: 329
    ------------[ cut here ]------------
    kernel BUG at fs/xfs/xfs_message.c:102!
    invalid opcode: 0000 [#1] PREEMPT SMP KASAN
    CPU: 1 PID: 14612 Comm: xfs_io Not tainted 6.3.0-rc2-next-20230315-00006-g2729d23ddb3b-dirty #422
    RIP: 0010:assfail+0x96/0xa0
    RSP: 0018:ffffc9000fa178c0 EFLAGS: 00010246
    RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffff888179a18000
    RDX: 0000000000000000 RSI: ffff888179a18000 RDI: 0000000000000002
    RBP: 0000000000000000 R08: ffffffff8321aab6 R09: 0000000000000000
    R10: 0000000000000001 R11: ffffed1105f85139 R12: ffffffff8aacc4c0
    R13: 0000000000000149 R14: ffff888269f58000 R15: 000000000000000c
    FS:  00007f42f27a4740(0000) GS:ffff88882fc00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000b92388 CR3: 000000024f006000 CR4: 00000000000006e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
     <TASK>
     xfs_getbmap+0x1a5b/0x1e40
     xfs_ioc_getbmap+0x1fd/0x5b0
     xfs_file_ioctl+0x2cb/0x1d50
     __x64_sys_ioctl+0x197/0x210
     do_syscall_64+0x39/0xb0
     entry_SYSCALL_64_after_hwframe+0x63/0xcd
    
    Above issue may happen as follows:
             ThreadA                       ThreadB
    do_shared_fault
     __do_fault
      xfs_filemap_fault
       __xfs_filemap_fault
        filemap_fault
                                 xfs_ioc_getbmap -> Without BMV_IF_DELALLOC flag
                                  xfs_getbmap
                                   xfs_ilock(ip, XFS_IOLOCK_SHARED);
                                   filemap_write_and_wait
     do_page_mkwrite
      xfs_filemap_page_mkwrite
       __xfs_filemap_fault
        xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
        iomap_page_mkwrite
         ...
         xfs_buffered_write_iomap_begin
          xfs_bmapi_reserve_delalloc -> Allocate delay extent
                                  xfs_ilock_data_map_shared(ip)
                                  xfs_getbmap_report_one
                                   ASSERT((bmv->bmv_iflags & BMV_IF_DELALLOC) != 0)
                                    -> trigger BUG_ON
    
    As xfs_filemap_page_mkwrite() only hold XFS_MMAPLOCK_SHARED lock, there's
    small window mkwrite can produce delay extent after file write in xfs_getbmap().
    To solve above issue, just skip delalloc extents.
    
    Signed-off-by: Ye Bin <yebin10@huawei.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Signed-off-by: Dave Chinner <david@fromorbit.com>
    Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
    Acked-by: Chandan Babu R <chandanbabu@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: Fix deadlock on xfs_inodegc_worker [+ + +]
Author: Wu Guanghao <wuguanghao3@huawei.com>
Date:   Tue Sep 24 11:38:27 2024 -0700

    xfs: Fix deadlock on xfs_inodegc_worker
    
    [ Upstream commit 4da112513c01d7d0acf1025b8764349d46e177d6 ]
    
    We are doing a test about deleting a large number of files
    when memory is low. A deadlock problem was found.
    
    [ 1240.279183] -> #1 (fs_reclaim){+.+.}-{0:0}:
    [ 1240.280450]        lock_acquire+0x197/0x460
    [ 1240.281548]        fs_reclaim_acquire.part.0+0x20/0x30
    [ 1240.282625]        kmem_cache_alloc+0x2b/0x940
    [ 1240.283816]        xfs_trans_alloc+0x8a/0x8b0
    [ 1240.284757]        xfs_inactive_ifree+0xe4/0x4e0
    [ 1240.285935]        xfs_inactive+0x4e9/0x8a0
    [ 1240.286836]        xfs_inodegc_worker+0x160/0x5e0
    [ 1240.287969]        process_one_work+0xa19/0x16b0
    [ 1240.289030]        worker_thread+0x9e/0x1050
    [ 1240.290131]        kthread+0x34f/0x460
    [ 1240.290999]        ret_from_fork+0x22/0x30
    [ 1240.291905]
    [ 1240.291905] -> #0 ((work_completion)(&gc->work)){+.+.}-{0:0}:
    [ 1240.293569]        check_prev_add+0x160/0x2490
    [ 1240.294473]        __lock_acquire+0x2c4d/0x5160
    [ 1240.295544]        lock_acquire+0x197/0x460
    [ 1240.296403]        __flush_work+0x6bc/0xa20
    [ 1240.297522]        xfs_inode_mark_reclaimable+0x6f0/0xdc0
    [ 1240.298649]        destroy_inode+0xc6/0x1b0
    [ 1240.299677]        dispose_list+0xe1/0x1d0
    [ 1240.300567]        prune_icache_sb+0xec/0x150
    [ 1240.301794]        super_cache_scan+0x2c9/0x480
    [ 1240.302776]        do_shrink_slab+0x3f0/0xaa0
    [ 1240.303671]        shrink_slab+0x170/0x660
    [ 1240.304601]        shrink_node+0x7f7/0x1df0
    [ 1240.305515]        balance_pgdat+0x766/0xf50
    [ 1240.306657]        kswapd+0x5bd/0xd20
    [ 1240.307551]        kthread+0x34f/0x460
    [ 1240.308346]        ret_from_fork+0x22/0x30
    [ 1240.309247]
    [ 1240.309247] other info that might help us debug this:
    [ 1240.309247]
    [ 1240.310944]  Possible unsafe locking scenario:
    [ 1240.310944]
    [ 1240.312379]        CPU0                    CPU1
    [ 1240.313363]        ----                    ----
    [ 1240.314433]   lock(fs_reclaim);
    [ 1240.315107]                                lock((work_completion)(&gc->work));
    [ 1240.316828]                                lock(fs_reclaim);
    [ 1240.318088]   lock((work_completion)(&gc->work));
    [ 1240.319203]
    [ 1240.319203]  *** DEADLOCK ***
    ...
    [ 2438.431081] Workqueue: xfs-inodegc/sda xfs_inodegc_worker
    [ 2438.432089] Call Trace:
    [ 2438.432562]  __schedule+0xa94/0x1d20
    [ 2438.435787]  schedule+0xbf/0x270
    [ 2438.436397]  schedule_timeout+0x6f8/0x8b0
    [ 2438.445126]  wait_for_completion+0x163/0x260
    [ 2438.448610]  __flush_work+0x4c4/0xa40
    [ 2438.455011]  xfs_inode_mark_reclaimable+0x6ef/0xda0
    [ 2438.456695]  destroy_inode+0xc6/0x1b0
    [ 2438.457375]  dispose_list+0xe1/0x1d0
    [ 2438.458834]  prune_icache_sb+0xe8/0x150
    [ 2438.461181]  super_cache_scan+0x2b3/0x470
    [ 2438.461950]  do_shrink_slab+0x3cf/0xa50
    [ 2438.462687]  shrink_slab+0x17d/0x660
    [ 2438.466392]  shrink_node+0x87e/0x1d40
    [ 2438.467894]  do_try_to_free_pages+0x364/0x1300
    [ 2438.471188]  try_to_free_pages+0x26c/0x5b0
    [ 2438.473567]  __alloc_pages_slowpath.constprop.136+0x7aa/0x2100
    [ 2438.482577]  __alloc_pages+0x5db/0x710
    [ 2438.485231]  alloc_pages+0x100/0x200
    [ 2438.485923]  allocate_slab+0x2c0/0x380
    [ 2438.486623]  ___slab_alloc+0x41f/0x690
    [ 2438.490254]  __slab_alloc+0x54/0x70
    [ 2438.491692]  kmem_cache_alloc+0x23e/0x270
    [ 2438.492437]  xfs_trans_alloc+0x88/0x880
    [ 2438.493168]  xfs_inactive_ifree+0xe2/0x4e0
    [ 2438.496419]  xfs_inactive+0x4eb/0x8b0
    [ 2438.497123]  xfs_inodegc_worker+0x16b/0x5e0
    [ 2438.497918]  process_one_work+0xbf7/0x1a20
    [ 2438.500316]  worker_thread+0x8c/0x1060
    [ 2438.504938]  ret_from_fork+0x22/0x30
    
    When the memory is insufficient, xfs_inonodegc_worker will trigger memory
    reclamation when memory is allocated, then flush_work() may be called to
    wait for the work to complete. This causes a deadlock.
    
    So use memalloc_nofs_save() to avoid triggering memory reclamation in
    xfs_inodegc_worker.
    
    Signed-off-by: Wu Guanghao <wuguanghao3@huawei.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
    Acked-by: Chandan Babu R <chandanbabu@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: fix extent busy updating [+ + +]
Author: Wengang Wang <wen.gang.wang@oracle.com>
Date:   Tue Sep 24 11:38:28 2024 -0700

    xfs: fix extent busy updating
    
    [ Upstream commit 601a27ea09a317d0fe2895df7d875381fb393041 ]
    
    In xfs_extent_busy_update_extent() case 6 and 7, whenever bno is modified on
    extent busy, the relavent length has to be modified accordingly.
    
    Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
    Acked-by: Chandan Babu R <chandanbabu@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: fix low space alloc deadlock [+ + +]
Author: Dave Chinner <dchinner@redhat.com>
Date:   Tue Sep 24 11:38:30 2024 -0700

    xfs: fix low space alloc deadlock
    
    [ Upstream commit 1dd0510f6d4b85616a36aabb9be38389467122d9 ]
    
    I've recently encountered an ABBA deadlock with g/476. The upcoming
    changes seem to make this much easier to hit, but the underlying
    problem is a pre-existing one.
    
    Essentially, if we select an AG for allocation, then lock the AGF
    and then fail to allocate for some reason (e.g. minimum length
    requirements cannot be satisfied), then we drop out of the
    allocation with the AGF still locked.
    
    The caller then modifies the allocation constraints - usually
    loosening them up - and tries again. This can result in trying to
    access AGFs that are lower than the AGF we already have locked from
    the failed attempt. e.g. the failed attempt skipped several AGs
    before failing, so we have locks an AG higher than the start AG.
    Retrying the allocation from the start AG then causes us to violate
    AGF lock ordering and this can lead to deadlocks.
    
    The deadlock exists even if allocation succeeds - we can do a
    followup allocations in the same transaction for BMBT blocks that
    aren't guaranteed to be in the same AG as the original, and can move
    into higher AGs. Hence we really need to move the tp->t_firstblock
    tracking down into xfs_alloc_vextent() where it can be set when we
    exit with a locked AG.
    
    xfs_alloc_vextent() can also check there if the requested
    allocation falls within the allow range of AGs set by
    tp->t_firstblock. If we can't allocate within the range set, we have
    to fail the allocation. If we are allowed to to non-blocking AGF
    locking, we can ignore the AG locking order limitations as we can
    use try-locks for the first iteration over requested AG range.
    
    This invalidates a set of post allocation asserts that check that
    the allocation is always above tp->t_firstblock if it is set.
    Because we can use try-locks to avoid the deadlock in some
    circumstances, having a pre-existing locked AGF doesn't always
    prevent allocation from lower order AGFs. Hence those ASSERTs need
    to be removed.
    
    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
    Acked-by: Chandan Babu R <chandanbabu@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: fix negative array access in xfs_getbmap [+ + +]
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Tue Sep 24 11:38:44 2024 -0700

    xfs: fix negative array access in xfs_getbmap
    
    [ Upstream commit 1bba82fe1afac69c85c1f5ea137c8e73de3c8032 ]
    
    In commit 8ee81ed581ff, Ye Bin complained about an ASSERT in the bmapx
    code that trips if we encounter a delalloc extent after flushing the
    pagecache to disk.  The ioctl code does not hold MMAPLOCK so it's
    entirely possible that a racing write page fault can create a delalloc
    extent after the file has been flushed.  The proposed solution was to
    replace the assertion with an early return that avoids filling out the
    bmap recordset with a delalloc entry if the caller didn't ask for it.
    
    At the time, I recall thinking that the forward logic sounded ok, but
    felt hesitant because I suspected that changing this code would cause
    something /else/ to burst loose due to some other subtlety.
    
    syzbot of course found that subtlety.  If all the extent mappings found
    after the flush are delalloc mappings, we'll reach the end of the data
    fork without ever incrementing bmv->bmv_entries.  This is new, since
    before we'd have emitted the delalloc mappings even though the caller
    didn't ask for them.  Once we reach the end, we'll try to set
    BMV_OF_LAST on the -1st entry (because bmv_entries is zero) and go
    corrupt something else in memory.  Yay.
    
    I really dislike all these stupid patches that fiddle around with debug
    code and break things that otherwise worked well enough.  Nobody was
    complaining that calling XFS_IOC_BMAPX without BMV_IF_DELALLOC would
    return BMV_OF_DELALLOC records, and now we've gone from "weird behavior
    that nobody cared about" to "bad behavior that must be addressed
    immediately".
    
    Maybe I'll just ignore anything from Huawei from now on for my own sake.
    
    Reported-by: syzbot+c103d3808a0de5faaf80@syzkaller.appspotmail.com
    Link: https://lore.kernel.org/linux-xfs/20230412024907.GP360889@frogsfrogsfrogs/
    Fixes: 8ee81ed581ff ("xfs: fix BUG_ON in xfs_getbmap()")
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Signed-off-by: Dave Chinner <david@fromorbit.com>
    Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
    Acked-by: Chandan Babu R <chandanbabu@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: fix reloading entire unlinked bucket lists [+ + +]
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Tue Sep 24 11:38:50 2024 -0700

    xfs: fix reloading entire unlinked bucket lists
    
    [ Upstream commit 537c013b140d373d1ffe6290b841dc00e67effaa ]
    
    During review of the patcheset that provided reloading of the incore
    iunlink list, Dave made a few suggestions, and I updated the copy in my
    dev tree.  Unfortunately, I then got distracted by ... who even knows
    what ... and forgot to backport those changes from my dev tree to my
    release candidate branch.  I then sent multiple pull requests with stale
    patches, and that's what was merged into -rc3.
    
    So.
    
    This patch re-adds the use of an unlocked iunlink list check to
    determine if we want to allocate the resources to recreate the incore
    list.  Since lost iunlinked inodes are supposed to be rare, this change
    helps us avoid paying the transaction and AGF locking costs every time
    we open any inode.
    
    This also re-adds the shutdowns on failure, and re-applies the
    restructuring of the inner loop in xfs_inode_reload_unlinked_bucket, and
    re-adds a requested comment about the quotachecking code.
    
    Retain the original RVB tag from Dave since there's no code change from
    the last submission.
    
    Fixes: 68b957f64fca1 ("xfs: load uncached unlinked inodes into memory on demand")
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
    Acked-by: Chandan Babu R <chandanbabu@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: fix the calculation for "end" and "length" [+ + +]
Author: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Date:   Tue Sep 24 11:38:42 2024 -0700

    xfs: fix the calculation for "end" and "length"
    
    [ Upstream commit 5cf32f63b0f4c520460c1a5dd915dc4f09085f29 ]
    
    The value of "end" should be "start + length - 1".
    
    Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
    Acked-by: Chandan Babu R <chandanbabu@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: fix uninitialized variable access [+ + +]
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Tue Sep 24 11:38:33 2024 -0700

    xfs: fix uninitialized variable access
    
    [ Upstream commit 60b730a40c43fbcc034970d3e77eb0f25b8cc1cf ]
    
    If the end position of a GETFSMAP query overlaps an allocated space and
    we're using the free space info to generate fsmap info, the akeys
    information gets fed into the fsmap formatter with bad results.
    Zero-init the space.
    
    Reported-by: syzbot+090ae72d552e6bd93cfe@syzkaller.appspotmail.com
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
    Acked-by: Chandan Babu R <chandanbabu@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: fix unlink vs cluster buffer instantiation race [+ + +]
Author: Dave Chinner <dchinner@redhat.com>
Date:   Tue Sep 24 11:38:45 2024 -0700

    xfs: fix unlink vs cluster buffer instantiation race
    
    [ Upstream commit 348a1983cf4cf5099fc398438a968443af4c9f65 ]
    
    Luis has been reporting an assert failure when freeing an inode
    cluster during inode inactivation for a while. The assert looks
    like:
    
     XFS: Assertion failed: bp->b_flags & XBF_DONE, file: fs/xfs/xfs_trans_buf.c, line: 241
     ------------[ cut here ]------------
     kernel BUG at fs/xfs/xfs_message.c:102!
     Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
     CPU: 4 PID: 73 Comm: kworker/4:1 Not tainted 6.10.0-rc1 #4
     Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
     Workqueue: xfs-inodegc/loop5 xfs_inodegc_worker [xfs]
     RIP: 0010:assfail (fs/xfs/xfs_message.c:102) xfs
     RSP: 0018:ffff88810188f7f0 EFLAGS: 00010202
     RAX: 0000000000000000 RBX: ffff88816e748250 RCX: 1ffffffff844b0e7
     RDX: 0000000000000004 RSI: ffff88810188f558 RDI: ffffffffc2431fa0
     RBP: 1ffff11020311f01 R08: 0000000042431f9f R09: ffffed1020311e9b
     R10: ffff88810188f4df R11: ffffffffac725d70 R12: ffff88817a3f4000
     R13: ffff88812182f000 R14: ffff88810188f998 R15: ffffffffc2423f80
     FS:  0000000000000000(0000) GS:ffff8881c8400000(0000) knlGS:0000000000000000
     CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
     CR2: 000055fe9d0f109c CR3: 000000014426c002 CR4: 0000000000770ef0
     DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
     DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
     PKRU: 55555554
     Call Trace:
      <TASK>
     xfs_trans_read_buf_map (fs/xfs/xfs_trans_buf.c:241 (discriminator 1)) xfs
     xfs_imap_to_bp (fs/xfs/xfs_trans.h:210 fs/xfs/libxfs/xfs_inode_buf.c:138) xfs
     xfs_inode_item_precommit (fs/xfs/xfs_inode_item.c:145) xfs
     xfs_trans_run_precommits (fs/xfs/xfs_trans.c:931) xfs
     __xfs_trans_commit (fs/xfs/xfs_trans.c:966) xfs
     xfs_inactive_ifree (fs/xfs/xfs_inode.c:1811) xfs
     xfs_inactive (fs/xfs/xfs_inode.c:2013) xfs
     xfs_inodegc_worker (fs/xfs/xfs_icache.c:1841 fs/xfs/xfs_icache.c:1886) xfs
     process_one_work (kernel/workqueue.c:3231)
     worker_thread (kernel/workqueue.c:3306 (discriminator 2) kernel/workqueue.c:3393 (discriminator 2))
     kthread (kernel/kthread.c:389)
     ret_from_fork (arch/x86/kernel/process.c:147)
     ret_from_fork_asm (arch/x86/entry/entry_64.S:257)
      </TASK>
    
    And occurs when the the inode precommit handlers is attempt to look
    up the inode cluster buffer to attach the inode for writeback.
    
    The trail of logic that I can reconstruct is as follows.
    
            1. the inode is clean when inodegc runs, so it is not
               attached to a cluster buffer when precommit runs.
    
            2. #1 implies the inode cluster buffer may be clean and not
               pinned by dirty inodes when inodegc runs.
    
            3. #2 implies that the inode cluster buffer can be reclaimed
               by memory pressure at any time.
    
            4. The assert failure implies that the cluster buffer was
               attached to the transaction, but not marked done. It had
               been accessed earlier in the transaction, but not marked
               done.
    
            5. #4 implies the cluster buffer has been invalidated (i.e.
               marked stale).
    
            6. #5 implies that the inode cluster buffer was instantiated
               uninitialised in the transaction in xfs_ifree_cluster(),
               which only instantiates the buffers to invalidate them
               and never marks them as done.
    
    Given factors 1-3, this issue is highly dependent on timing and
    environmental factors. Hence the issue can be very difficult to
    reproduce in some situations, but highly reliable in others. Luis
    has an environment where it can be reproduced easily by g/531 but,
    OTOH, I've reproduced it only once in ~2000 cycles of g/531.
    
    I think the fix is to have xfs_ifree_cluster() set the XBF_DONE flag
    on the cluster buffers, even though they may not be initialised. The
    reasons why I think this is safe are:
    
            1. A buffer cache lookup hit on a XBF_STALE buffer will
               clear the XBF_DONE flag. Hence all future users of the
               buffer know they have to re-initialise the contents
               before use and mark it done themselves.
    
            2. xfs_trans_binval() sets the XFS_BLI_STALE flag, which
               means the buffer remains locked until the journal commit
               completes and the buffer is unpinned. Hence once marked
               XBF_STALE/XFS_BLI_STALE by xfs_ifree_cluster(), the only
               context that can access the freed buffer is the currently
               running transaction.
    
            3. #2 implies that future buffer lookups in the currently
               running transaction will hit the transaction match code
               and not the buffer cache. Hence XBF_STALE and
               XFS_BLI_STALE will not be cleared unless the transaction
               initialises and logs the buffer with valid contents
               again. At which point, the buffer will be marked marked
               XBF_DONE again, so having XBF_DONE already set on the
               stale buffer is a moot point.
    
            4. #2 also implies that any concurrent access to that
               cluster buffer will block waiting on the buffer lock
               until the inode cluster has been fully freed and is no
               longer an active inode cluster buffer.
    
            5. #4 + #1 means that any future user of the disk range of
               that buffer will always see the range of disk blocks
               covered by the cluster buffer as not done, and hence must
               initialise the contents themselves.
    
            6. Setting XBF_DONE in xfs_ifree_cluster() then means the
               unlinked inode precommit code will see a XBF_DONE buffer
               from the transaction match as it expects. It can then
               attach the stale but newly dirtied inode to the stale
               but newly dirtied cluster buffer without unexpected
               failures. The stale buffer will then sail through the
               journal and do the right thing with the attached stale
               inode during unpin.
    
    Hence the fix is just one line of extra code. The explanation of
    why we have to set XBF_DONE in xfs_ifree_cluster, OTOH, is long and
    complex....
    
    Fixes: 82842fee6e59 ("xfs: fix AGF vs inode cluster buffer deadlock")
    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Tested-by: Luis Chamberlain <mcgrof@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
    Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
    Acked-by: Chandan Babu R <chandanbabu@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: journal geometry is not properly bounds checked [+ + +]
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Sep 25 17:57:05 2024 -0700

    xfs: journal geometry is not properly bounds checked
    
    [ Upstream commit f1e1765aad7de7a8b8102044fc6a44684bc36180 ]
    
    If the journal geometry results in a sector or log stripe unit
    validation problem, it indicates that we cannot set the log up to
    safely write to the the journal. In these cases, we must abort the
    mount because the corruption needs external intervention to resolve.
    Similarly, a journal that is too large cannot be written to safely,
    either, so we shouldn't allow those geometries to mount, either.
    
    If the log is too small, we risk having transaction reservations
    overruning the available log space and the system hanging waiting
    for space it can never provide. This is purely a runtime hang issue,
    not a corruption issue as per the first cases listed above. We abort
    mounts of the log is too small for V5 filesystems, but we must allow
    v4 filesystems to mount because, historically, there was no log size
    validity checking and so some systems may still be out there with
    undersized logs.
    
    The problem is that on V4 filesystems, when we discover a log
    geometry problem, we skip all the remaining checks and then allow
    the log to continue mounting. This mean that if one of the log size
    checks fails, we skip the log stripe unit check. i.e. we allow the
    mount because a "non-fatal" geometry is violated, and then fail to
    check the hard fail geometries that should fail the mount.
    
    Move all these fatal checks to the superblock verifier, and add a
    new check for the two log sector size geometry variables having the
    same values. This will prevent any attempt to mount a log that has
    invalid or inconsistent geometries long before we attempt to mount
    the log.
    
    However, for the minimum log size checks, we can only do that once
    we've setup up the log and calculated all the iclog sizes and
    roundoffs. Hence this needs to remain in the log mount code after
    the log has been initialised. It is also the only case where we
    should allow a v4 filesystem to continue running, so leave that
    handling in place, too.
    
    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: load uncached unlinked inodes into memory on demand [+ + +]
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Tue Sep 24 11:38:43 2024 -0700

    xfs: load uncached unlinked inodes into memory on demand
    
    [ Upstream commit 68b957f64fca1930164bfc6d6d379acdccd547d7 ]
    
    shrikanth hegde reports that filesystems fail shortly after mount with
    the following failure:
    
            WARNING: CPU: 56 PID: 12450 at fs/xfs/xfs_inode.c:1839 xfs_iunlink_lookup+0x58/0x80 [xfs]
    
    This of course is the WARN_ON_ONCE in xfs_iunlink_lookup:
    
            ip = radix_tree_lookup(&pag->pag_ici_root, agino);
            if (WARN_ON_ONCE(!ip || !ip->i_ino)) { ... }
    
    >From diagnostic data collected by the bug reporters, it would appear
    that we cleanly mounted a filesystem that contained unlinked inodes.
    Unlinked inodes are only processed as a final step of log recovery,
    which means that clean mounts do not process the unlinked list at all.
    
    Prior to the introduction of the incore unlinked lists, this wasn't a
    problem because the unlink code would (very expensively) traverse the
    entire ondisk metadata iunlink chain to keep things up to date.
    However, the incore unlinked list code complains when it realizes that
    it is out of sync with the ondisk metadata and shuts down the fs, which
    is bad.
    
    Ritesh proposed to solve this problem by unconditionally parsing the
    unlinked lists at mount time, but this imposes a mount time cost for
    every filesystem to catch something that should be very infrequent.
    Instead, let's target the places where we can encounter a next_unlinked
    pointer that refers to an inode that is not in cache, and load it into
    cache.
    
    Note: This patch does not address the problem of iget loading an inode
    from the middle of the iunlink list and needing to set i_prev_unlinked
    correctly.
    
    Reported-by: shrikanth hegde <sshegde@linux.vnet.ibm.com>
    Triaged-by: Ritesh Harjani <ritesh.list@gmail.com>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
    Acked-by: Chandan Babu R <chandanbabu@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: make inode unlinked bucket recovery work with quotacheck [+ + +]
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Tue Sep 24 11:38:49 2024 -0700

    xfs: make inode unlinked bucket recovery work with quotacheck
    
    [ Upstream commit 49813a21ed57895b73ec4ed3b99d4beec931496f ]
    
    Teach quotacheck to reload the unlinked inode lists when walking the
    inode table.  This requires extra state handling, since it's possible
    that a reloaded inode will get inactivated before quotacheck tries to
    scan it; in this case, we need to ensure that the reloaded inode does
    not have dquots attached when it is freed.
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
    Acked-by: Chandan Babu R <chandanbabu@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: prefer free inodes at ENOSPC over chunk allocation [+ + +]
Author: Dave Chinner <dchinner@redhat.com>
Date:   Tue Sep 24 11:38:31 2024 -0700

    xfs: prefer free inodes at ENOSPC over chunk allocation
    
    [ Upstream commit f08f984c63e9980614ae3a0a574b31eaaef284b2 ]
    
    When an XFS filesystem has free inodes in chunks already allocated
    on disk, it will still allocate new inode chunks if the target AG
    has no free inodes in it. Normally, this is a good idea as it
    preserves locality of all the inodes in a given directory.
    
    However, at ENOSPC this can lead to using the last few remaining
    free filesystem blocks to allocate a new chunk when there are many,
    many free inodes that could be allocated without consuming free
    space. This results in speeding up the consumption of the last few
    blocks and inode create operations then returning ENOSPC when there
    free inodes available because we don't have enough block left in the
    filesystem for directory creation reservations to proceed.
    
    Hence when we are near ENOSPC, we should be attempting to preserve
    the remaining blocks for directory block allocation rather than
    using them for unnecessary inode chunk creation.
    
    This particular behaviour is exposed by xfs/294, when it drives to
    ENOSPC on empty file creation whilst there are still thousands of
    free inodes available for allocation in other AGs in the filesystem.
    
    Hence, when we are within 1% of ENOSPC, change the inode allocation
    behaviour to prefer to use existing free inodes over allocating new
    inode chunks, even though it results is poorer locality of the data
    set. It is more important for the allocations to be space efficient
    near ENOSPC than to have optimal locality for performance, so lets
    modify the inode AG selection code to reflect that fact.
    
    This allows generic/294 to not only pass with this allocator rework
    patchset, but to increase the number of post-ENOSPC empty inode
    allocations to from ~600 to ~9080 before we hit ENOSPC on the
    directory create transaction reservation.
    
    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
    Acked-by: Chandan Babu R <chandanbabu@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: quotacheck failure can race with background inode inactivation [+ + +]
Author: Dave Chinner <dchinner@redhat.com>
Date:   Tue Sep 24 11:38:34 2024 -0700

    xfs: quotacheck failure can race with background inode inactivation
    
    [ Upstream commit 0c7273e494dd5121e20e160cb2f047a593ee14a8 ]
    
    The background inode inactivation can attached dquots to inodes, but
    this can race with a foreground quotacheck failure that leads to
    disabling quotas and freeing the mp->m_quotainfo structure. The
    background inode inactivation then tries to allocate a quota, tries
    to dereference mp->m_quotainfo, and crashes like so:
    
    XFS (loop1): Quotacheck: Unsuccessful (Error -5): Disabling quotas.
    xfs filesystem being mounted at /root/syzkaller.qCVHXV/0/file0 supports timestamps until 2038 (0x7fffffff)
    BUG: kernel NULL pointer dereference, address: 00000000000002a8
    ....
    CPU: 0 PID: 161 Comm: kworker/0:4 Not tainted 6.2.0-c9c3395d5e3d #1
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
    Workqueue: xfs-inodegc/loop1 xfs_inodegc_worker
    RIP: 0010:xfs_dquot_alloc+0x95/0x1e0
    ....
    Call Trace:
     <TASK>
     xfs_qm_dqread+0x46/0x440
     xfs_qm_dqget_inode+0x154/0x500
     xfs_qm_dqattach_one+0x142/0x3c0
     xfs_qm_dqattach_locked+0x14a/0x170
     xfs_qm_dqattach+0x52/0x80
     xfs_inactive+0x186/0x340
     xfs_inodegc_worker+0xd3/0x430
     process_one_work+0x3b1/0x960
     worker_thread+0x52/0x660
     kthread+0x161/0x1a0
     ret_from_fork+0x29/0x50
     </TASK>
    ....
    
    Prevent this race by flushing all the queued background inode
    inactivations pending before purging all the cached dquots when
    quotacheck fails.
    
    Reported-by: Pengfei Xu <pengfei.xu@intel.com>
    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
    Acked-by: Chandan Babu R <chandanbabu@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: reload entire unlinked bucket lists [+ + +]
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Tue Sep 24 11:38:48 2024 -0700

    xfs: reload entire unlinked bucket lists
    
    [ Upstream commit 83771c50e42b92de6740a63e152c96c052d37736 ]
    
    The previous patch to reload unrecovered unlinked inodes when adding a
    newly created inode to the unlinked list is missing a key piece of
    functionality.  It doesn't handle the case that someone calls xfs_iget
    on an inode that is not the last item in the incore list.  For example,
    if at mount time the ondisk iunlink bucket looks like this:
    
    AGI -> 7 -> 22 -> 3 -> NULL
    
    None of these three inodes are cached in memory.  Now let's say that
    someone tries to open inode 3 by handle.  We need to walk the list to
    make sure that inodes 7 and 22 get loaded cold, and that the
    i_prev_unlinked of inode 3 gets set to 22.
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
    Acked-by: Chandan Babu R <chandanbabu@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: remove WARN when dquot cache insertion fails [+ + +]
Author: Dave Chinner <dchinner@redhat.com>
Date:   Tue Sep 24 11:38:41 2024 -0700

    xfs: remove WARN when dquot cache insertion fails
    
    [ Upstream commit 4b827b3f305d1fcf837265f1e12acc22ee84327c ]
    
    It just creates unnecessary bot noise these days.
    
    Reported-by: syzbot+6ae213503fb12e87934f@syzkaller.appspotmail.com
    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Dave Chinner <david@fromorbit.com>
    Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
    Acked-by: Chandan Babu R <chandanbabu@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: set bnobt/cntbt numrecs correctly when formatting new AGs [+ + +]
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Tue Sep 24 11:38:51 2024 -0700

    xfs: set bnobt/cntbt numrecs correctly when formatting new AGs
    
    [ Upstream commit 8e698ee72c4ecbbf18264568eb310875839fd601 ]
    
    Through generic/300, I discovered that mkfs.xfs creates corrupt
    filesystems when given these parameters:
    
    Filesystems formatted with --unsupported are not supported!!
    meta-data=/dev/sda               isize=512    agcount=8, agsize=16352 blks
             =                       sectsz=512   attr=2, projid32bit=1
             =                       crc=1        finobt=1, sparse=1, rmapbt=1
             =                       reflink=1    bigtime=1 inobtcount=1 nrext64=1
    data     =                       bsize=4096   blocks=130816, imaxpct=25
             =                       sunit=32     swidth=128 blks
    naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
    log      =internal log           bsize=4096   blocks=8192, version=2
             =                       sectsz=512   sunit=32 blks, lazy-count=1
    realtime =none                   extsz=4096   blocks=0, rtextents=0
             =                       rgcount=0    rgsize=0 blks
    Discarding blocks...Done.
    Phase 1 - find and verify superblock...
            - reporting progress in intervals of 15 minutes
    Phase 2 - using internal log
            - zero log...
            - 16:30:50: zeroing log - 16320 of 16320 blocks done
            - scan filesystem freespace and inode maps...
    agf_freeblks 25, counted 0 in ag 4
    sb_fdblocks 8823, counted 8798
    
    The root cause of this problem is the numrecs handling in
    xfs_freesp_init_recs, which is used to initialize a new AG.  Prior to
    calling the function, we set up the new bnobt block with numrecs == 1
    and rely on _freesp_init_recs to format that new record.  If the last
    record created has a blockcount of zero, then it sets numrecs = 0.
    
    That last bit isn't correct if the AG contains the log, the start of the
    log is not immediately after the initial blocks due to stripe alignment,
    and the end of the log is perfectly aligned with the end of the AG.  For
    this case, we actually formatted a single bnobt record to handle the
    free space before the start of the (stripe aligned) log, and incremented
    arec to try to format a second record.  That second record turned out to
    be unnecessary, so what we really want is to leave numrecs at 1.
    
    The numrecs handling itself is overly complicated because a different
    function sets numrecs == 1.  Change the bnobt creation code to start
    with numrecs set to zero and only increment it after successfully
    formatting a free space extent into the btree block.
    
    Fixes: f327a00745ff ("xfs: account for log space when formatting new AGs")
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Signed-off-by: Dave Chinner <david@fromorbit.com>
    Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
    Acked-by: Chandan Babu R <chandanbabu@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: use i_prev_unlinked to distinguish inodes that are not on the unlinked list [+ + +]
Author: Darrick J. Wong <djwong@kernel.org>
Date:   Tue Sep 24 11:38:47 2024 -0700

    xfs: use i_prev_unlinked to distinguish inodes that are not on the unlinked list
    
    [ Upstream commit f12b96683d6976a3a07fdf3323277c79dbe8f6ab ]
    
    Alter the definition of i_prev_unlinked slightly to make it more obvious
    when an inode with 0 link count is not part of the iunlink bucket lists
    rooted in the AGI.  This distinction is necessary because it is not
    sufficient to check inode.i_nlink to decide if an inode is on the
    unlinked list.  Updates to i_nlink can happen while holding only
    ILOCK_EXCL, but updates to an inode's position in the AGI unlinked list
    (which happen after the nlink update) requires both ILOCK_EXCL and the
    AGI buffer lock.
    
    The next few patches will make it possible to reload an entire unlinked
    bucket list when we're walking the inode table or performing handle
    operations and need more than the ability to iget the last inode in the
    chain.
    
    The upcoming directory repair code also needs to be able to make this
    distinction to decide if a zero link count directory should be moved to
    the orphanage or allowed to inactivate.  An upcoming enhancement to the
    online AGI fsck code will need this distinction to check and rebuild the
    AGI unlinked buckets.
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
    Acked-by: Chandan Babu R <chandanbabu@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>