Changelog in Linux kernel 6.10.7

ACPI: EC: Evaluate _REG outside the EC scope more carefully [+ + +]

Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Date:   Mon Aug 12 15:16:21 2024 +0200

    ACPI: EC: Evaluate _REG outside the EC scope more carefully
    
    commit 71bf41b8e913ec9fc91f0d39ab8fb320229ec604 upstream.
    
    Commit 60fa6ae6e6d0 ("ACPI: EC: Install address space handler at the
    namespace root") caused _REG methods for EC operation regions outside
    the EC device scope to be evaluated which on some systems leads to the
    evaluation of _REG methods in the scopes of device objects representing
    devices that are not present and not functional according to the _STA
    return values. Some of those device objects represent EC "alternatives"
    and if _REG is evaluated for their operation regions, the platform
    firmware may be confused and the platform may start to behave
    incorrectly.
    
    To avoid this problem, only evaluate _REG for EC operation regions
    located in the scopes of device objects representing known-to-be-present
    devices.
    
    For this purpose, partially revert commit 60fa6ae6e6d0 and trigger the
    evaluation of _REG for EC operation regions from acpi_bus_attach() for
    the known-valid devices.
    
    Fixes: 60fa6ae6e6d0 ("ACPI: EC: Install address space handler at the namespace root")
    Link: https://lore.kernel.org/linux-acpi/1f76b7e2-1928-4598-8037-28a1785c2d13@redhat.com
    Link: https://bugzilla.redhat.com/show_bug.cgi?id=2298938
    Link: https://bugzilla.redhat.com/show_bug.cgi?id=2302253
    Reported-by: Hans de Goede <hdegoede@redhat.com>
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Reviewed-by: Hans de Goede <hdegoede@redhat.com>
    Cc: All applicable <stable@vger.kernel.org>
    Link: https://patch.msgid.link/23612351.6Emhk5qWAg@rjwysocki.net
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ACPI: video: Add backlight=native quirk for Dell OptiPlex 7760 AIO [+ + +]

Author: Hans de Goede <hdegoede@redhat.com>
Date:   Wed Aug 14 21:01:59 2024 +0200

    ACPI: video: Add backlight=native quirk for Dell OptiPlex 7760 AIO
    
    commit 5c7bb62cb8f53de71d8ab3d619be22740da0b837 upstream.
    
    Dell All In One (AIO) models released after 2017 may use a backlight
    controller board connected to an UART.
    
    In DSDT this uart port will be defined as:
    
       Name (_HID, "DELL0501")
       Name (_CID, EisaId ("PNP0501")
    
    The Dell OptiPlex 7760 AIO has an ACPI device for one if its UARTs with
    the above _HID + _CID. Loading the dell-uart-backlight driver shows that
    there actually is a backlight controller board attached to the UART,
    which reports a firmware version of "G&MX01-V15".
    
    But the backlight controller board does not actually control the backlight
    brightness and the GPU's native backlight control method does work.
    
    Add a quirk to use the GPU's native backlight control method on this model.
    
    Fixes: 484bae9e4d6a ("platform/x86: Add new Dell UART backlight driver")
    Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2303936
    Cc: All applicable <stable@vger.kernel.org>
    Signed-off-by: Hans de Goede <hdegoede@redhat.com>
    Reviewed-by: Andy Shevchenko <andy@kernel.org>
    Link: https://patch.msgid.link/20240814190159.15650-4-hdegoede@redhat.com
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ACPI: video: Add Dell UART backlight controller detection [+ + +]

Author: Hans de Goede <hdegoede@redhat.com>
Date:   Wed Aug 14 21:01:57 2024 +0200

    ACPI: video: Add Dell UART backlight controller detection
    
    commit cd8e468efb4fb2742e06328a75b282c35c1abf8d upstream.
    
    Dell All In One (AIO) models released after 2017 use a backlight
    controller board connected to an UART.
    
    In DSDT this uart port will be defined as:
    
       Name (_HID, "DELL0501")
       Name (_CID, EisaId ("PNP0501")
    
    Commit 484bae9e4d6a ("platform/x86: Add new Dell UART backlight driver")
    has added support for this, but I neglected to tie this into
    acpi_video_get_backlight_type().
    
    Now the first AIO has turned up which has not only the DSDT bits for this,
    but also an actual controller attached to the UART, yet it is not using
    this controller for backlight control.
    
    Add support to acpi_video_get_backlight_type() for a new dell_uart
    backlight type. So that the existing infra to override the backlight
    control method on the commandline or with DMI quirks can be used.
    
    Fixes: 484bae9e4d6a ("platform/x86: Add new Dell UART backlight driver")
    Cc: All applicable <stable@vger.kernel.org>
    Signed-off-by: Hans de Goede <hdegoede@redhat.com>
    Reviewed-by: Andy Shevchenko <andy@kernel.org>
    Link: https://patch.msgid.link/20240814190159.15650-2-hdegoede@redhat.com
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ACPICA: Add a depth argument to acpi_execute_reg_methods() [+ + +]

Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Date:   Mon Aug 12 15:11:42 2024 +0200

    ACPICA: Add a depth argument to acpi_execute_reg_methods()
    
    commit cdf65d73e001fde600b18d7e45afadf559425ce5 upstream.
    
    A subsequent change will need to pass a depth argument to
    acpi_execute_reg_methods(), so prepare that function for it.
    
    No intentional functional changes.
    
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Reviewed-by: Hans de Goede <hdegoede@redhat.com>
    Cc: All applicable <stable@vger.kernel.org>
    Link: https://patch.msgid.link/8451567.NyiUUSuA9g@rjwysocki.net
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

alloc_tag: introduce clear_page_tag_ref() helper function [+ + +]

Author: Suren Baghdasaryan <surenb@google.com>
Date:   Tue Aug 13 08:07:56 2024 -0700

    alloc_tag: introduce clear_page_tag_ref() helper function
    
    commit a8fc28dad6d574582cdf2f7e78c73c59c623df30 upstream.
    
    In several cases we are freeing pages which were not allocated using
    common page allocators.  For such cases, in order to keep allocation
    accounting correct, we should clear the page tag to indicate that the page
    being freed is expected to not have a valid allocation tag.  Introduce
    clear_page_tag_ref() helper function to be used for this.
    
    Link: https://lkml.kernel.org/r/20240813150758.855881-1-surenb@google.com
    Fixes: d224eb0287fb ("codetag: debug: mark codetags for reserved pages as empty")
    Signed-off-by: Suren Baghdasaryan <surenb@google.com>
    Suggested-by: David Hildenbrand <david@redhat.com>
    Acked-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: Kent Overstreet <kent.overstreet@linux.dev>
    Cc: Sourav Panda <souravpanda@google.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: <stable@vger.kernel.org>    [6.10]
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

alloc_tag: mark pages reserved during CMA activation as not tagged [+ + +]

Author: Suren Baghdasaryan <surenb@google.com>
Date:   Tue Aug 13 08:07:57 2024 -0700

    alloc_tag: mark pages reserved during CMA activation as not tagged
    
    commit 766c163c2068b45330664fb67df67268e588a22d upstream.
    
    During CMA activation, pages in CMA area are prepared and then freed
    without being allocated.  This triggers warnings when memory allocation
    debug config (CONFIG_MEM_ALLOC_PROFILING_DEBUG) is enabled.  Fix this by
    marking these pages not tagged before freeing them.
    
    Link: https://lkml.kernel.org/r/20240813150758.855881-2-surenb@google.com
    Fixes: d224eb0287fb ("codetag: debug: mark codetags for reserved pages as empty")
    Signed-off-by: Suren Baghdasaryan <surenb@google.com>
    Acked-by: David Hildenbrand <david@redhat.com>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: Kent Overstreet <kent.overstreet@linux.dev>
    Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
    Cc: Sourav Panda <souravpanda@google.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: <stable@vger.kernel.org>    [6.10]
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: hda/realtek: Fix noise from speakers on Lenovo IdeaPad 3 15IAU7 [+ + +]

Author: Parsa Poorshikhian <parsa.poorsh@gmail.com>
Date:   Sat Aug 10 18:39:06 2024 +0330

    ALSA: hda/realtek: Fix noise from speakers on Lenovo IdeaPad 3 15IAU7
    
    [ Upstream commit ef9718b3d54e822de294351251f3a574f8a082ce ]
    
    Fix noise from speakers connected to AUX port when no sound is playing.
    The problem occurs because the `alc_shutup_pins` function includes
    a 0x10ec0257 vendor ID, which causes noise on Lenovo IdeaPad 3 15IAU7 with
    Realtek ALC257 codec when no sound is playing.
    Removing this vendor ID from the function fixes the bug.
    
    Fixes: 70794b9563fe ("ALSA: hda/realtek: Add more codec ID to no shutup pins list")
    Signed-off-by: Parsa Poorshikhian <parsa.poorsh@gmail.com>
    Link: https://patch.msgid.link/20240810150939.330693-1-parsa.poorsh@gmail.com
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ALSA: hda/tas2781: fix wrong calibrated data order [+ + +]

Author: Baojun Xu <baojun.xu@ti.com>
Date:   Tue Aug 13 12:37:48 2024 +0800

    ALSA: hda/tas2781: fix wrong calibrated data order
    
    commit 3beddef84d90590270465a907de1cfe2539ac70d upstream.
    
    Wrong calibration data order cause sound too low in some device.
    Fix wrong calibrated data order, add calibration data converssion
    by get_unaligned_be32() after reading from UEFI.
    
    Fixes: 5be27f1e3ec9 ("ALSA: hda/tas2781: Add tas2781 HDA driver")
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Baojun Xu <baojun.xu@ti.com>
    Link: https://patch.msgid.link/20240813043749.108-1-shenghao-ding@ti.com
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: hda/tas2781: Use correct endian conversion [+ + +]

Author: Takashi Iwai <tiwai@suse.de>
Date:   Wed Aug 14 12:04:59 2024 +0200

    ALSA: hda/tas2781: Use correct endian conversion
    
    [ Upstream commit 829e2a23121fb36ee30ea5145c2a85199f68e2c8 ]
    
    The data conversion is done rather by a wrong function.  We convert to
    BE32, not from BE32.  Although the end result must be same, this was
    complained by the compiler.
    
    Fix the code again and align with another similar function
    tas2563_apply_calib() that does already right.
    
    Fixes: 3beddef84d90 ("ALSA: hda/tas2781: fix wrong calibrated data order")
    Reported-by: kernel test robot <lkp@intel.com>
    Closes: https://lore.kernel.org/oe-kbuild-all/202408141630.DiDUB8Z4-lkp@intel.com/
    Link: https://patch.msgid.link/20240814100500.1944-1-tiwai@suse.de
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ALSA: timer: Relax start tick time check for slave timer elements [+ + +]

Author: Takashi Iwai <tiwai@suse.de>
Date:   Sat Aug 10 10:48:32 2024 +0200

    ALSA: timer: Relax start tick time check for slave timer elements
    
    commit ccbfcac05866ebe6eb3bc6d07b51d4ed4fcde436 upstream.
    
    The recent addition of a sanity check for a too low start tick time
    seems breaking some applications that uses aloop with a certain slave
    timer setup.  They may have the initial resolution 0, hence it's
    treated as if it were a too low value.
    
    Relax and skip the check for the slave timer instance for addressing
    the regression.
    
    Fixes: 4a63bd179fa8 ("ALSA: timer: Set lower bound of start tick time")
    Cc: <stable@vger.kernel.org>
    Link: https://github.com/raspberrypi/linux/issues/6294
    Link: https://patch.msgid.link/20240810084833.10939-1-tiwai@suse.de
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: usb-audio: Add delay quirk for VIVO USB-C-XE710 HEADSET [+ + +]

Author: Lianqin Hu <hulianqin@vivo.com>
Date:   Sun Aug 11 08:30:11 2024 +0000

    ALSA: usb-audio: Add delay quirk for VIVO USB-C-XE710 HEADSET
    
    commit 004eb8ba776ccd3e296ea6f78f7ae7985b12824e upstream.
    
    Audio control requests that sets sampling frequency sometimes fail on
    this card. Adding delay between control messages eliminates that problem.
    
    Signed-off-by: Lianqin Hu <hulianqin@vivo.com>
    Cc: <stable@vger.kernel.org>
    Link: https://patch.msgid.link/TYUPR06MB6217FF67076AF3E49E12C877D2842@TYUPR06MB6217.apcprd06.prod.outlook.com
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: usb-audio: Support Yamaha P-125 quirk entry [+ + +]

Author: Juan José Arboleda <soyjuanarbol@gmail.com>
Date:   Tue Aug 13 11:10:53 2024 -0500

    ALSA: usb-audio: Support Yamaha P-125 quirk entry
    
    commit c286f204ce6ba7b48e3dcba53eda7df8eaa64dd9 upstream.
    
    This patch adds a USB quirk for the Yamaha P-125 digital piano.
    
    Signed-off-by: Juan José Arboleda <soyjuanarbol@gmail.com>
    Cc: <stable@vger.kernel.org>
    Link: https://patch.msgid.link/20240813161053.70256-1-soyjuanarbol@gmail.com
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

arm64: ACPI: NUMA: initialize all values of acpi_early_node_map to NUMA_NO_NODE [+ + +]

Author: Haibo Xu <haibo1.xu@intel.com>
Date:   Mon Aug 5 11:30:24 2024 +0800

    arm64: ACPI: NUMA: initialize all values of acpi_early_node_map to NUMA_NO_NODE
    
    commit a21dcf0ea8566ebbe011c79d6ed08cdfea771de3 upstream.
    
    Currently, only acpi_early_node_map[0] was initialized to NUMA_NO_NODE.
    To ensure all the values were properly initialized, switch to initialize
    all of them to NUMA_NO_NODE.
    
    Fixes: e18962491696 ("arm64: numa: rework ACPI NUMA initialization")
    Cc: <stable@vger.kernel.org> # 4.19.x
    Reported-by: Andrew Jones <ajones@ventanamicro.com>
    Suggested-by: Andrew Jones <ajones@ventanamicro.com>
    Signed-off-by: Haibo Xu <haibo1.xu@intel.com>
    Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
    Reviewed-by: Sunil V L <sunilvl@ventanamicro.com>
    Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
    Acked-by: Catalin Marinas <catalin.marinas@arm.com>
    Acked-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
    Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
    Link: https://lore.kernel.org/r/853d7f74aa243f6f5999e203246f0d1ae92d2b61.1722828421.git.haibo1.xu@intel.com
    Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

arm64: Fix KASAN random tag seed initialization [+ + +]

Author: Samuel Holland <samuel.holland@sifive.com>
Date:   Wed Aug 14 02:09:53 2024 -0700

    arm64: Fix KASAN random tag seed initialization
    
    [ Upstream commit f75c235565f90c4a17b125e47f1c68ef6b8c2bce ]
    
    Currently, kasan_init_sw_tags() is called before setup_per_cpu_areas(),
    so per_cpu(prng_state, cpu) accesses the same address regardless of the
    value of "cpu", and the same seed value gets copied to the percpu area
    for every CPU. Fix this by moving the call to smp_prepare_boot_cpu(),
    which is the first architecture hook after setup_per_cpu_areas().
    
    Fixes: 3c9e3aa11094 ("kasan: add tag related helper functions")
    Fixes: 3f41b6093823 ("kasan: fix random seed generation for tag-based mode")
    Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
    Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
    Link: https://lore.kernel.org/r/20240814091005.969756-1-samuel.holland@sifive.com
    Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ata: pata_macio: Fix DMA table overflow [+ + +]

Author: Michael Ellerman <mpe@ellerman.id.au>
Date:   Tue Aug 20 13:03:58 2024 +1000

    ata: pata_macio: Fix DMA table overflow
    
    commit 822c8020aebcf5804a143b891e34f29873fee5e2 upstream.
    
    Kolbjørn and Jonáš reported that their 32-bit PowerMacs were crashing
    in pata-macio since commit 09fe2bfa6b83 ("ata: pata_macio: Fix
    max_segment_size with PAGE_SIZE == 64K").
    
    For example:
    
      kernel BUG at drivers/ata/pata_macio.c:544!
      Oops: Exception in kernel mode, sig: 5 [#1]
      BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2 DEBUG_PAGEALLOC PowerMac
      ...
      NIP pata_macio_qc_prep+0xf4/0x190
      LR  pata_macio_qc_prep+0xfc/0x190
      Call Trace:
        0xc1421660 (unreliable)
        ata_qc_issue+0x14c/0x2d4
        __ata_scsi_queuecmd+0x200/0x53c
        ata_scsi_queuecmd+0x50/0xe0
        scsi_queue_rq+0x788/0xb1c
        __blk_mq_issue_directly+0x58/0xf4
        blk_mq_plug_issue_direct+0x8c/0x1b4
        blk_mq_flush_plug_list.part.0+0x584/0x5e0
        __blk_flush_plug+0xf8/0x194
        __submit_bio+0x1b8/0x2e0
        submit_bio_noacct_nocheck+0x230/0x304
        btrfs_work_helper+0x200/0x338
        process_one_work+0x1a8/0x338
        worker_thread+0x364/0x4c0
        kthread+0x100/0x104
        start_kernel_thread+0x10/0x14
    
    That commit increased max_segment_size to 64KB, with the justification
    that the SCSI core was already using that size when PAGE_SIZE == 64KB,
    and that there was existing logic to split over-sized requests.
    
    However with a sufficiently large request, the splitting logic causes
    each sg to be split into two commands in the DMA table, leading to
    overflow of the DMA table, triggering the BUG_ON().
    
    With default settings the bug doesn't trigger, because the request size
    is limited by max_sectors_kb == 1280, however max_sectors_kb can be
    increased, and apparently some distros do that by default using udev
    rules.
    
    Fix the bug for 4KB kernels by reverting to the old max_segment_size.
    
    For 64KB kernels the sg_tablesize needs to be halved, to allow for the
    possibility that each sg will be split into two.
    
    Fixes: 09fe2bfa6b83 ("ata: pata_macio: Fix max_segment_size with PAGE_SIZE == 64K")
    Cc: stable@vger.kernel.org # v6.10+
    Reported-by: Kolbjørn Barmen <linux-ppc@kolla.no>
    Closes: https://lore.kernel.org/all/62d248bb-e97a-25d2-bcf2-9160c518cae5@kolla.no/
    Reported-by: Jonáš Vidra <vidra@ufal.mff.cuni.cz>
    Closes: https://lore.kernel.org/all/3b6441b8-06e6-45da-9e55-f92f2c86933e@ufal.mff.cuni.cz/
    Tested-by: Kolbjørn Barmen <linux-ppc@kolla.no>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

atm: idt77252: prevent use after free in dequeue_rx() [+ + +]

Author: Dan Carpenter <dan.carpenter@linaro.org>
Date:   Fri Aug 9 15:28:19 2024 +0300

    atm: idt77252: prevent use after free in dequeue_rx()
    
    [ Upstream commit a9a18e8f770c9b0703dab93580d0b02e199a4c79 ]
    
    We can't dereference "skb" after calling vcc->push() because the skb
    is released.
    
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

block: Fix lockdep warning in blk_mq_mark_tag_wait [+ + +]

Author: Li Lingfeng <lilingfeng3@huawei.com>
Date:   Thu Aug 15 10:47:36 2024 +0800

    block: Fix lockdep warning in blk_mq_mark_tag_wait
    
    [ Upstream commit b313a8c835516bdda85025500be866ac8a74e022 ]
    
    Lockdep reported a warning in Linux version 6.6:
    
    [  414.344659] ================================
    [  414.345155] WARNING: inconsistent lock state
    [  414.345658] 6.6.0-07439-gba2303cacfda #6 Not tainted
    [  414.346221] --------------------------------
    [  414.346712] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
    [  414.347545] kworker/u10:3/1152 [HC0[0]:SC0[0]:HE0:SE1] takes:
    [  414.349245] ffff88810edd1098 (&sbq->ws[i].wait){+.?.}-{2:2}, at: blk_mq_dispatch_rq_list+0x131c/0x1ee0
    [  414.351204] {IN-SOFTIRQ-W} state was registered at:
    [  414.351751]   lock_acquire+0x18d/0x460
    [  414.352218]   _raw_spin_lock_irqsave+0x39/0x60
    [  414.352769]   __wake_up_common_lock+0x22/0x60
    [  414.353289]   sbitmap_queue_wake_up+0x375/0x4f0
    [  414.353829]   sbitmap_queue_clear+0xdd/0x270
    [  414.354338]   blk_mq_put_tag+0xdf/0x170
    [  414.354807]   __blk_mq_free_request+0x381/0x4d0
    [  414.355335]   blk_mq_free_request+0x28b/0x3e0
    [  414.355847]   __blk_mq_end_request+0x242/0xc30
    [  414.356367]   scsi_end_request+0x2c1/0x830
    [  414.345155] WARNING: inconsistent lock state
    [  414.345658] 6.6.0-07439-gba2303cacfda #6 Not tainted
    [  414.346221] --------------------------------
    [  414.346712] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
    [  414.347545] kworker/u10:3/1152 [HC0[0]:SC0[0]:HE0:SE1] takes:
    [  414.349245] ffff88810edd1098 (&sbq->ws[i].wait){+.?.}-{2:2}, at: blk_mq_dispatch_rq_list+0x131c/0x1ee0
    [  414.351204] {IN-SOFTIRQ-W} state was registered at:
    [  414.351751]   lock_acquire+0x18d/0x460
    [  414.352218]   _raw_spin_lock_irqsave+0x39/0x60
    [  414.352769]   __wake_up_common_lock+0x22/0x60
    [  414.353289]   sbitmap_queue_wake_up+0x375/0x4f0
    [  414.353829]   sbitmap_queue_clear+0xdd/0x270
    [  414.354338]   blk_mq_put_tag+0xdf/0x170
    [  414.354807]   __blk_mq_free_request+0x381/0x4d0
    [  414.355335]   blk_mq_free_request+0x28b/0x3e0
    [  414.355847]   __blk_mq_end_request+0x242/0xc30
    [  414.356367]   scsi_end_request+0x2c1/0x830
    [  414.356863]   scsi_io_completion+0x177/0x1610
    [  414.357379]   scsi_complete+0x12f/0x260
    [  414.357856]   blk_complete_reqs+0xba/0xf0
    [  414.358338]   __do_softirq+0x1b0/0x7a2
    [  414.358796]   irq_exit_rcu+0x14b/0x1a0
    [  414.359262]   sysvec_call_function_single+0xaf/0xc0
    [  414.359828]   asm_sysvec_call_function_single+0x1a/0x20
    [  414.360426]   default_idle+0x1e/0x30
    [  414.360873]   default_idle_call+0x9b/0x1f0
    [  414.361390]   do_idle+0x2d2/0x3e0
    [  414.361819]   cpu_startup_entry+0x55/0x60
    [  414.362314]   start_secondary+0x235/0x2b0
    [  414.362809]   secondary_startup_64_no_verify+0x18f/0x19b
    [  414.363413] irq event stamp: 428794
    [  414.363825] hardirqs last  enabled at (428793): [<ffffffff816bfd1c>] ktime_get+0x1dc/0x200
    [  414.364694] hardirqs last disabled at (428794): [<ffffffff85470177>] _raw_spin_lock_irq+0x47/0x50
    [  414.365629] softirqs last  enabled at (428444): [<ffffffff85474780>] __do_softirq+0x540/0x7a2
    [  414.366522] softirqs last disabled at (428419): [<ffffffff813f65ab>] irq_exit_rcu+0x14b/0x1a0
    [  414.367425]
                   other info that might help us debug this:
    [  414.368194]  Possible unsafe locking scenario:
    [  414.368900]        CPU0
    [  414.369225]        ----
    [  414.369548]   lock(&sbq->ws[i].wait);
    [  414.370000]   <Interrupt>
    [  414.370342]     lock(&sbq->ws[i].wait);
    [  414.370802]
                    *** DEADLOCK ***
    [  414.371569] 5 locks held by kworker/u10:3/1152:
    [  414.372088]  #0: ffff88810130e938 ((wq_completion)writeback){+.+.}-{0:0}, at: process_scheduled_works+0x357/0x13f0
    [  414.373180]  #1: ffff88810201fdb8 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}, at: process_scheduled_works+0x3a3/0x13f0
    [  414.374384]  #2: ffffffff86ffbdc0 (rcu_read_lock){....}-{1:2}, at: blk_mq_run_hw_queue+0x637/0xa00
    [  414.375342]  #3: ffff88810edd1098 (&sbq->ws[i].wait){+.?.}-{2:2}, at: blk_mq_dispatch_rq_list+0x131c/0x1ee0
    [  414.376377]  #4: ffff888106205a08 (&hctx->dispatch_wait_lock){+.-.}-{2:2}, at: blk_mq_dispatch_rq_list+0x1337/0x1ee0
    [  414.378607]
                   stack backtrace:
    [  414.379177] CPU: 0 PID: 1152 Comm: kworker/u10:3 Not tainted 6.6.0-07439-gba2303cacfda #6
    [  414.380032] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
    [  414.381177] Workqueue: writeback wb_workfn (flush-253:0)
    [  414.381805] Call Trace:
    [  414.382136]  <TASK>
    [  414.382429]  dump_stack_lvl+0x91/0xf0
    [  414.382884]  mark_lock_irq+0xb3b/0x1260
    [  414.383367]  ? __pfx_mark_lock_irq+0x10/0x10
    [  414.383889]  ? stack_trace_save+0x8e/0xc0
    [  414.384373]  ? __pfx_stack_trace_save+0x10/0x10
    [  414.384903]  ? graph_lock+0xcf/0x410
    [  414.385350]  ? save_trace+0x3d/0xc70
    [  414.385808]  mark_lock.part.20+0x56d/0xa90
    [  414.386317]  mark_held_locks+0xb0/0x110
    [  414.386791]  ? __pfx_do_raw_spin_lock+0x10/0x10
    [  414.387320]  lockdep_hardirqs_on_prepare+0x297/0x3f0
    [  414.387901]  ? _raw_spin_unlock_irq+0x28/0x50
    [  414.388422]  trace_hardirqs_on+0x58/0x100
    [  414.388917]  _raw_spin_unlock_irq+0x28/0x50
    [  414.389422]  __blk_mq_tag_busy+0x1d6/0x2a0
    [  414.389920]  __blk_mq_get_driver_tag+0x761/0x9f0
    [  414.390899]  blk_mq_dispatch_rq_list+0x1780/0x1ee0
    [  414.391473]  ? __pfx_blk_mq_dispatch_rq_list+0x10/0x10
    [  414.392070]  ? sbitmap_get+0x2b8/0x450
    [  414.392533]  ? __blk_mq_get_driver_tag+0x210/0x9f0
    [  414.393095]  __blk_mq_sched_dispatch_requests+0xd99/0x1690
    [  414.393730]  ? elv_attempt_insert_merge+0x1b1/0x420
    [  414.394302]  ? __pfx___blk_mq_sched_dispatch_requests+0x10/0x10
    [  414.394970]  ? lock_acquire+0x18d/0x460
    [  414.395456]  ? blk_mq_run_hw_queue+0x637/0xa00
    [  414.395986]  ? __pfx_lock_acquire+0x10/0x10
    [  414.396499]  blk_mq_sched_dispatch_requests+0x109/0x190
    [  414.397100]  blk_mq_run_hw_queue+0x66e/0xa00
    [  414.397616]  blk_mq_flush_plug_list.part.17+0x614/0x2030
    [  414.398244]  ? __pfx_blk_mq_flush_plug_list.part.17+0x10/0x10
    [  414.398897]  ? writeback_sb_inodes+0x241/0xcc0
    [  414.399429]  blk_mq_flush_plug_list+0x65/0x80
    [  414.399957]  __blk_flush_plug+0x2f1/0x530
    [  414.400458]  ? __pfx___blk_flush_plug+0x10/0x10
    [  414.400999]  blk_finish_plug+0x59/0xa0
    [  414.401467]  wb_writeback+0x7cc/0x920
    [  414.401935]  ? __pfx_wb_writeback+0x10/0x10
    [  414.402442]  ? mark_held_locks+0xb0/0x110
    [  414.402931]  ? __pfx_do_raw_spin_lock+0x10/0x10
    [  414.403462]  ? lockdep_hardirqs_on_prepare+0x297/0x3f0
    [  414.404062]  wb_workfn+0x2b3/0xcf0
    [  414.404500]  ? __pfx_wb_workfn+0x10/0x10
    [  414.404989]  process_scheduled_works+0x432/0x13f0
    [  414.405546]  ? __pfx_process_scheduled_works+0x10/0x10
    [  414.406139]  ? do_raw_spin_lock+0x101/0x2a0
    [  414.406641]  ? assign_work+0x19b/0x240
    [  414.407106]  ? lock_is_held_type+0x9d/0x110
    [  414.407604]  worker_thread+0x6f2/0x1160
    [  414.408075]  ? __kthread_parkme+0x62/0x210
    [  414.408572]  ? lockdep_hardirqs_on_prepare+0x297/0x3f0
    [  414.409168]  ? __kthread_parkme+0x13c/0x210
    [  414.409678]  ? __pfx_worker_thread+0x10/0x10
    [  414.410191]  kthread+0x33c/0x440
    [  414.410602]  ? __pfx_kthread+0x10/0x10
    [  414.411068]  ret_from_fork+0x4d/0x80
    [  414.411526]  ? __pfx_kthread+0x10/0x10
    [  414.411993]  ret_from_fork_asm+0x1b/0x30
    [  414.412489]  </TASK>
    
    When interrupt is turned on while a lock holding by spin_lock_irq it
    throws a warning because of potential deadlock.
    
    blk_mq_prep_dispatch_rq
     blk_mq_get_driver_tag
      __blk_mq_get_driver_tag
       __blk_mq_alloc_driver_tag
        blk_mq_tag_busy -> tag is already busy
        // failed to get driver tag
     blk_mq_mark_tag_wait
      spin_lock_irq(&wq->lock) -> lock A (&sbq->ws[i].wait)
      __add_wait_queue(wq, wait) -> wait queue active
      blk_mq_get_driver_tag
      __blk_mq_tag_busy
    -> 1) tag must be idle, which means there can't be inflight IO
       spin_lock_irq(&tags->lock) -> lock B (hctx->tags)
       spin_unlock_irq(&tags->lock) -> unlock B, turn on interrupt accidentally
    -> 2) context must be preempt by IO interrupt to trigger deadlock.
    
    As shown above, the deadlock is not possible in theory, but the warning
    still need to be fixed.
    
    Fix it by using spin_lock_irqsave to get lockB instead of spin_lock_irq.
    
    Fixes: 4f1731df60f9 ("blk-mq: fix potential io hang by wrong 'wake_batch'")
    Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
    Reviewed-by: Ming Lei <ming.lei@redhat.com>
    Reviewed-by: Yu Kuai <yukuai3@huawei.com>
    Reviewed-by: Bart Van Assche <bvanassche@acm.org>
    Link: https://lore.kernel.org/r/20240815024736.2040971-1-lilingfeng@huaweicloud.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

Bluetooth: HCI: Invert LE State quirk to be opt-out rather then opt-in [+ + +]

Author: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Date:   Mon Aug 12 10:43:48 2024 -0400

    Bluetooth: HCI: Invert LE State quirk to be opt-out rather then opt-in
    
    [ Upstream commit aae6b81260fd9a7224f7eb4fc440d625852245bb ]
    
    This inverts the LE State quirk so by default we assume the controllers
    would report valid states rather than invalid which is how quirks
    normally behave, also this would result in HCI command failing it the LE
    States are really broken thus exposing the controllers that are really
    broken in this respect.
    
    Link: https://github.com/bluez/bluez/issues/584
    Fixes: 220915857e29 ("Bluetooth: Adding driver and quirk defs for multi-role LE")
    Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

Bluetooth: hci_core: Fix LE quote calculation [+ + +]

Author: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Date:   Mon Aug 12 11:22:08 2024 -0400

    Bluetooth: hci_core: Fix LE quote calculation
    
    [ Upstream commit 932021a11805b9da4bd6abf66fe233cccd59fe0e ]
    
    Function hci_sched_le needs to update the respective counter variable
    inplace other the likes of hci_quote_sent would attempt to use the
    possible outdated value of conn->{le_cnt,acl_cnt}.
    
    Link: https://github.com/bluez/bluez/issues/915
    Fixes: 73d80deb7bdf ("Bluetooth: prioritizing data over HCI")
    Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

Bluetooth: MGMT: Add error handling to pair_device() [+ + +]

Author: Griffin Kroah-Hartman <griffin@kroah.com>
Date:   Thu Aug 15 13:51:00 2024 +0200

    Bluetooth: MGMT: Add error handling to pair_device()
    
    commit 538fd3921afac97158d4177139a0ad39f056dbb2 upstream.
    
    hci_conn_params_add() never checks for a NULL value and could lead to a NULL
    pointer dereference causing a crash.
    
    Fixed by adding error handling in the function.
    
    Cc: Stable <stable@kernel.org>
    Fixes: 5157b8a503fa ("Bluetooth: Fix initializing conn_params in scan phase")
    Signed-off-by: Griffin Kroah-Hartman <griffin@kroah.com>
    Reported-by: Yiwei Zhang <zhan4630@purdue.edu>
    Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Bluetooth: SMP: Fix assumption of Central always being Initiator [+ + +]

Author: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Date:   Wed Aug 30 15:08:06 2023 -0700

    Bluetooth: SMP: Fix assumption of Central always being Initiator
    
    [ Upstream commit 28cd47f75185c4818b0fb1b46f2f02faaba96376 ]
    
    SMP initiator role shall be considered the one that initiates the
    pairing procedure with SMP_CMD_PAIRING_REQ:
    
    BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 3, Part H
    page 1557:
    
    Figure 2.1: LE pairing phases
    
    Note that by sending SMP_CMD_SECURITY_REQ it doesn't change the role to
    be Initiator.
    
    Link: https://github.com/bluez/bluez/issues/567
    Fixes: b28b4943660f ("Bluetooth: Add strict checks for allowed SMP PDUs")
    Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

bnxt_en: Fix double DMA unmapping for XDP_REDIRECT [+ + +]

Author: Somnath Kotur <somnath.kotur@broadcom.com>
Date:   Tue Aug 20 13:34:15 2024 -0700

    bnxt_en: Fix double DMA unmapping for XDP_REDIRECT
    
    [ Upstream commit 8baeef7616d5194045c5a6b97fd1246b87c55b13 ]
    
    Remove the dma_unmap_page_attrs() call in the driver's XDP_REDIRECT
    code path.  This should have been removed when we let the page pool
    handle the DMA mapping.  This bug causes the warning:
    
    WARNING: CPU: 7 PID: 59 at drivers/iommu/dma-iommu.c:1198 iommu_dma_unmap_page+0xd5/0x100
    CPU: 7 PID: 59 Comm: ksoftirqd/7 Tainted: G        W          6.8.0-1010-gcp #11-Ubuntu
    Hardware name: Dell Inc. PowerEdge R7525/0PYVT1, BIOS 2.15.2 04/02/2024
    RIP: 0010:iommu_dma_unmap_page+0xd5/0x100
    Code: 89 ee 48 89 df e8 cb f2 69 ff 48 83 c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d 31 c0 31 d2 31 c9 31 f6 31 ff 45 31 c0 e9 ab 17 71 00 <0f> 0b 48 83 c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d 31 c0 31 d2 31 c9
    RSP: 0018:ffffab1fc0597a48 EFLAGS: 00010246
    RAX: 0000000000000000 RBX: ffff99ff838280c8 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
    RBP: ffffab1fc0597a78 R08: 0000000000000002 R09: ffffab1fc0597c1c
    R10: ffffab1fc0597cd3 R11: ffff99ffe375acd8 R12: 00000000e65b9000
    R13: 0000000000000050 R14: 0000000000001000 R15: 0000000000000002
    FS:  0000000000000000(0000) GS:ffff9a06efb80000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000565c34c37210 CR3: 00000005c7e3e000 CR4: 0000000000350ef0
    ? show_regs+0x6d/0x80
    ? __warn+0x89/0x150
    ? iommu_dma_unmap_page+0xd5/0x100
    ? report_bug+0x16a/0x190
    ? handle_bug+0x51/0xa0
    ? exc_invalid_op+0x18/0x80
    ? iommu_dma_unmap_page+0xd5/0x100
    ? iommu_dma_unmap_page+0x35/0x100
    dma_unmap_page_attrs+0x55/0x220
    ? bpf_prog_4d7e87c0d30db711_xdp_dispatcher+0x64/0x9f
    bnxt_rx_xdp+0x237/0x520 [bnxt_en]
    bnxt_rx_pkt+0x640/0xdd0 [bnxt_en]
    __bnxt_poll_work+0x1a1/0x3d0 [bnxt_en]
    bnxt_poll+0xaa/0x1e0 [bnxt_en]
    __napi_poll+0x33/0x1e0
    net_rx_action+0x18a/0x2f0
    
    Fixes: 578fcfd26e2a ("bnxt_en: Let the page pool manage the DMA mapping")
    Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
    Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
    Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
    Signed-off-by: Michael Chan <michael.chan@broadcom.com>
    Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
    Link: https://patch.msgid.link/20240820203415.168178-1-michael.chan@broadcom.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

bonding: fix bond_ipsec_offload_ok return type [+ + +]

Author: Nikolay Aleksandrov <razor@blackwall.org>
Date:   Fri Aug 16 14:48:10 2024 +0300

    bonding: fix bond_ipsec_offload_ok return type
    
    [ Upstream commit fc59b9a5f7201b9f7272944596113a82cc7773d5 ]
    
    Fix the return type which should be bool.
    
    Fixes: 955b785ec6b3 ("bonding: fix suspicious RCU usage in bond_ipsec_offload_ok()")
    Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
    Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

bonding: fix null pointer deref in bond_ipsec_offload_ok [+ + +]

Author: Nikolay Aleksandrov <razor@blackwall.org>
Date:   Fri Aug 16 14:48:11 2024 +0300

    bonding: fix null pointer deref in bond_ipsec_offload_ok
    
    [ Upstream commit 95c90e4ad89d493a7a14fa200082e466e2548f9d ]
    
    We must check if there is an active slave before dereferencing the pointer.
    
    Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves")
    Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
    Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

bonding: fix xfrm real_dev null pointer dereference [+ + +]

Author: Nikolay Aleksandrov <razor@blackwall.org>
Date:   Fri Aug 16 14:48:12 2024 +0300

    bonding: fix xfrm real_dev null pointer dereference
    
    [ Upstream commit f8cde9805981c50d0c029063dc7d82821806fc44 ]
    
    We shouldn't set real_dev to NULL because packets can be in transit and
    xfrm might call xdo_dev_offload_ok() in parallel. All callbacks assume
    real_dev is set.
    
     Example trace:
     kernel: BUG: unable to handle page fault for address: 0000000000001030
     kernel: bond0: (slave eni0np1): making interface the new active one
     kernel: #PF: supervisor write access in kernel mode
     kernel: #PF: error_code(0x0002) - not-present page
     kernel: PGD 0 P4D 0
     kernel: Oops: 0002 [#1] PREEMPT SMP
     kernel: CPU: 4 PID: 2237 Comm: ping Not tainted 6.7.7+ #12
     kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
     kernel: RIP: 0010:nsim_ipsec_offload_ok+0xc/0x20 [netdevsim]
     kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
     kernel: Code: e0 0f 0b 48 83 7f 38 00 74 de 0f 0b 48 8b 47 08 48 8b 37 48 8b 78 40 e9 b2 e5 9a d7 66 90 0f 1f 44 00 00 48 8b 86 80 02 00 00 <83> 80 30 10 00 00 01 b8 01 00 00 00 c3 0f 1f 80 00 00 00 00 0f 1f
     kernel: bond0: (slave eni0np1): making interface the new active one
     kernel: RSP: 0018:ffffabde81553b98 EFLAGS: 00010246
     kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
     kernel:
     kernel: RAX: 0000000000000000 RBX: ffff9eb404e74900 RCX: ffff9eb403d97c60
     kernel: RDX: ffffffffc090de10 RSI: ffff9eb404e74900 RDI: ffff9eb3c5de9e00
     kernel: RBP: ffff9eb3c0a42000 R08: 0000000000000010 R09: 0000000000000014
     kernel: R10: 7974203030303030 R11: 3030303030303030 R12: 0000000000000000
     kernel: R13: ffff9eb3c5de9e00 R14: ffffabde81553cc8 R15: ffff9eb404c53000
     kernel: FS:  00007f2a77a3ad00(0000) GS:ffff9eb43bd00000(0000) knlGS:0000000000000000
     kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
     kernel: CR2: 0000000000001030 CR3: 00000001122ab000 CR4: 0000000000350ef0
     kernel: bond0: (slave eni0np1): making interface the new active one
     kernel: Call Trace:
     kernel:  <TASK>
     kernel:  ? __die+0x1f/0x60
     kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
     kernel:  ? page_fault_oops+0x142/0x4c0
     kernel:  ? do_user_addr_fault+0x65/0x670
     kernel:  ? kvm_read_and_reset_apf_flags+0x3b/0x50
     kernel: bond0: (slave eni0np1): making interface the new active one
     kernel:  ? exc_page_fault+0x7b/0x180
     kernel:  ? asm_exc_page_fault+0x22/0x30
     kernel:  ? nsim_bpf_uninit+0x50/0x50 [netdevsim]
     kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
     kernel:  ? nsim_ipsec_offload_ok+0xc/0x20 [netdevsim]
     kernel: bond0: (slave eni0np1): making interface the new active one
     kernel:  bond_ipsec_offload_ok+0x7b/0x90 [bonding]
     kernel:  xfrm_output+0x61/0x3b0
     kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
     kernel:  ip_push_pending_frames+0x56/0x80
    
    Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves")
    Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
    Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

bonding: fix xfrm state handling when clearing active slave [+ + +]

Author: Nikolay Aleksandrov <razor@blackwall.org>
Date:   Fri Aug 16 14:48:13 2024 +0300

    bonding: fix xfrm state handling when clearing active slave
    
    [ Upstream commit c4c5c5d2ef40a9f67a9241dc5422eac9ffe19547 ]
    
    If the active slave is cleared manually the xfrm state is not flushed.
    This leads to xfrm add/del imbalance and adding the same state multiple
    times. For example when the device cannot handle anymore states we get:
     [ 1169.884811] bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
    because it's filled with the same state after multiple active slave
    clearings. This change also has a few nice side effects: user-space
    gets a notification for the change, the old device gets its mac address
    and promisc/mcast adjusted properly.
    
    Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves")
    Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
    Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

bpf: Fix a kernel verifier crash in stacksafe() [+ + +]

Author: Yonghong Song <yonghong.song@linux.dev>
Date:   Mon Aug 12 14:48:47 2024 -0700

    bpf: Fix a kernel verifier crash in stacksafe()
    
    [ Upstream commit bed2eb964c70b780fb55925892a74f26cb590b25 ]
    
    Daniel Hodges reported a kernel verifier crash when playing with sched-ext.
    Further investigation shows that the crash is due to invalid memory access
    in stacksafe(). More specifically, it is the following code:
    
        if (exact != NOT_EXACT &&
            old->stack[spi].slot_type[i % BPF_REG_SIZE] !=
            cur->stack[spi].slot_type[i % BPF_REG_SIZE])
                return false;
    
    The 'i' iterates old->allocated_stack.
    If cur->allocated_stack < old->allocated_stack the out-of-bound
    access will happen.
    
    To fix the issue add 'i >= cur->allocated_stack' check such that if
    the condition is true, stacksafe() should fail. Otherwise,
    cur->stack[spi].slot_type[i % BPF_REG_SIZE] memory access is legal.
    
    Fixes: 2793a8b015f7 ("bpf: exact states comparison for iterator convergence checks")
    Cc: Eduard Zingerman <eddyz87@gmail.com>
    Reported-by: Daniel Hodges <hodgesd@meta.com>
    Acked-by: Eduard Zingerman <eddyz87@gmail.com>
    Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
    Link: https://lore.kernel.org/r/20240812214847.213612-1-yonghong.song@linux.dev
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

bpf: Fix updating attached freplace prog in prog_array map [+ + +]

Author: Leon Hwang <leon.hwang@linux.dev>
Date:   Sun Jul 28 19:46:11 2024 +0800

    bpf: Fix updating attached freplace prog in prog_array map
    
    [ Upstream commit fdad456cbcca739bae1849549c7a999857c56f88 ]
    
    The commit f7866c358733 ("bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT")
    fixed a NULL pointer dereference panic, but didn't fix the issue that
    fails to update attached freplace prog to prog_array map.
    
    Since commit 1c123c567fb1 ("bpf: Resolve fext program type when checking map compatibility"),
    freplace prog and its target prog are able to tail call each other.
    
    And the commit 3aac1ead5eb6 ("bpf: Move prog->aux->linked_prog and trampoline into bpf_link on attach")
    sets prog->aux->dst_prog as NULL after attaching freplace prog to its
    target prog.
    
    After loading freplace the prog_array's owner type is BPF_PROG_TYPE_SCHED_CLS.
    Then, after attaching freplace its prog->aux->dst_prog is NULL.
    Then, while updating freplace in prog_array the bpf_prog_map_compatible()
    incorrectly returns false because resolve_prog_type() returns
    BPF_PROG_TYPE_EXT instead of BPF_PROG_TYPE_SCHED_CLS.
    After this patch the resolve_prog_type() returns BPF_PROG_TYPE_SCHED_CLS
    and update to prog_array can succeed.
    
    Fixes: f7866c358733 ("bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT")
    Cc: Toke Høiland-Jørgensen <toke@redhat.com>
    Cc: Martin KaFai Lau <martin.lau@kernel.org>
    Acked-by: Yonghong Song <yonghong.song@linux.dev>
    Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
    Link: https://lore.kernel.org/r/20240728114612.48486-2-leon.hwang@linux.dev
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

btrfs: check delayed refs when we're checking if a ref exists [+ + +]

Author: Josef Bacik <josef@toxicpanda.com>
Date:   Thu Apr 11 16:41:20 2024 -0400

    btrfs: check delayed refs when we're checking if a ref exists
    
    commit 42fac187b5c746227c92d024f1caf33bc1d337e4 upstream.
    
    In the patch 78c52d9eb6b7 ("btrfs: check for refs on snapshot delete
    resume") I added some code to handle file systems that had been
    corrupted by a bug that incorrectly skipped updating the drop progress
    key while dropping a snapshot.  This code would check to see if we had
    already deleted our reference for a child block, and skip the deletion
    if we had already.
    
    Unfortunately there is a bug, as the check would only check the on-disk
    references.  I made an incorrect assumption that blocks in an already
    deleted snapshot that was having the deletion resume on mount wouldn't
    be modified.
    
    If we have 2 pending deleted snapshots that share blocks, we can easily
    modify the rules for a block.  Take the following example
    
    subvolume a exists, and subvolume b is a snapshot of subvolume a.  They
    share references to block 1.  Block 1 will have 2 full references, one
    for subvolume a and one for subvolume b, and it belongs to subvolume a
    (btrfs_header_owner(block 1) == subvolume a).
    
    When deleting subvolume a, we will drop our full reference for block 1,
    and because we are the owner we will drop our full reference for all of
    block 1's children, convert block 1 to FULL BACKREF, and add a shared
    reference to all of block 1's children.
    
    Then we will start the snapshot deletion of subvolume b.  We look up the
    extent info for block 1, which checks delayed refs and tells us that
    FULL BACKREF is set, so sets parent to the bytenr of block 1.  However
    because this is a resumed snapshot deletion, we call into
    check_ref_exists().  Because check_ref_exists() only looks at the disk,
    it doesn't find the shared backref for the child of block 1, and thus
    returns 0 and we skip deleting the reference for the child of block 1
    and continue.  This orphans the child of block 1.
    
    The fix is to lookup the delayed refs, similar to what we do in
    btrfs_lookup_extent_info().  However we only care about whether the
    reference exists or not.  If we fail to find our reference on disk, go
    look up the bytenr in the delayed refs, and if it exists look for an
    existing ref in the delayed ref head.  If that exists then we know we
    can delete the reference safely and carry on.  If it doesn't exist we
    know we have to skip over this block.
    
    This bug has existed since I introduced this fix, however requires
    having multiple deleted snapshots pending when we unmount.  We noticed
    this in production because our shutdown path stops the container on the
    system, which deletes a bunch of subvolumes, and then reboots the box.
    This gives us plenty of opportunities to hit this issue.  Looking at the
    history we've seen this occasionally in production, but we had a big
    spike recently thanks to faster machines getting jobs with multiple
    subvolumes in the job.
    
    Chris Mason wrote a reproducer which does the following
    
    mount /dev/nvme4n1 /btrfs
    btrfs subvol create /btrfs/s1
    simoop -E -f 4k -n 200000 -z /btrfs/s1
    while(true) ; do
            btrfs subvol snap /btrfs/s1 /btrfs/s2
            simoop -f 4k -n 200000 -r 10 -z /btrfs/s2
            btrfs subvol snap /btrfs/s2 /btrfs/s3
            btrfs balance start -dusage=80 /btrfs
            btrfs subvol del /btrfs/s2 /btrfs/s3
            umount /btrfs
            btrfsck /dev/nvme4n1 || exit 1
            mount /dev/nvme4n1 /btrfs
    done
    
    On the second loop this would fail consistently, with my patch it has
    been running for hours and hasn't failed.
    
    I also used dm-log-writes to capture the state of the failure so I could
    debug the problem.  Using the existing failure case to test my patch
    validated that it fixes the problem.
    
    Fixes: 78c52d9eb6b7 ("btrfs: check for refs on snapshot delete resume")
    CC: stable@vger.kernel.org # 5.4+
    Reviewed-by: Filipe Manana <fdmanana@suse.com>
    Signed-off-by: Josef Bacik <josef@toxicpanda.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

btrfs: fix invalid mapping of extent xarray state [+ + +]

Author: Naohiro Aota <naohiro.aota@wdc.com>
Date:   Fri Aug 9 16:54:22 2024 +0900

    btrfs: fix invalid mapping of extent xarray state
    
    [ Upstream commit 6252690f7e1b173b86a4c27dfc046b351ab423e7 ]
    
    In __extent_writepage_io(), we call btrfs_set_range_writeback() ->
    folio_start_writeback(), which clears PAGECACHE_TAG_DIRTY mark from the
    mapping xarray if the folio is not dirty. This worked fine before commit
    97713b1a2ced ("btrfs: do not clear page dirty inside
    extent_write_locked_range()").
    
    After the commit, however, the folio is still dirty at this point, so the
    mapping DIRTY tag is not cleared anymore. Then, __extent_writepage_io()
    calls btrfs_folio_clear_dirty() to clear the folio's dirty flag. That
    results in the page being unlocked with a "strange" state. The page is not
    PageDirty, but the mapping tag is set as PAGECACHE_TAG_DIRTY.
    
    This strange state looks like causing a hang with a call trace below when
    running fstests generic/091 on a null_blk device. It is waiting for a folio
    lock.
    
    While I don't have an exact relation between this hang and the strange
    state, fixing the state also fixes the hang. And, that state is worth
    fixing anyway.
    
    This commit reorders btrfs_folio_clear_dirty() and
    btrfs_set_range_writeback() in __extent_writepage_io(), so that the
    PAGECACHE_TAG_DIRTY tag is properly removed from the xarray.
    
      [464.274] task:fsx             state:D stack:0     pid:3034  tgid:3034  ppid:2853   flags:0x00004002
      [464.286] Call Trace:
      [464.291]  <TASK>
      [464.295]  __schedule+0x10ed/0x6260
      [464.301]  ? __pfx___blk_flush_plug+0x10/0x10
      [464.308]  ? __submit_bio+0x37c/0x450
      [464.314]  ? __pfx___schedule+0x10/0x10
      [464.321]  ? lock_release+0x567/0x790
      [464.327]  ? __pfx_lock_acquire+0x10/0x10
      [464.334]  ? __pfx_lock_release+0x10/0x10
      [464.340]  ? __pfx_lock_acquire+0x10/0x10
      [464.347]  ? __pfx_lock_release+0x10/0x10
      [464.353]  ? do_raw_spin_lock+0x12e/0x270
      [464.360]  schedule+0xdf/0x3b0
      [464.365]  io_schedule+0x8f/0xf0
      [464.371]  folio_wait_bit_common+0x2ca/0x6d0
      [464.378]  ? folio_wait_bit_common+0x1cc/0x6d0
      [464.385]  ? __pfx_folio_wait_bit_common+0x10/0x10
      [464.392]  ? __pfx_filemap_get_folios_tag+0x10/0x10
      [464.400]  ? __pfx_wake_page_function+0x10/0x10
      [464.407]  ? __pfx___might_resched+0x10/0x10
      [464.414]  ? do_raw_spin_unlock+0x58/0x1f0
      [464.420]  extent_write_cache_pages+0xe49/0x1620 [btrfs]
      [464.428]  ? lock_acquire+0x435/0x500
      [464.435]  ? __pfx_extent_write_cache_pages+0x10/0x10 [btrfs]
      [464.443]  ? btrfs_do_write_iter+0x493/0x640 [btrfs]
      [464.451]  ? orc_find.part.0+0x1d4/0x380
      [464.457]  ? __pfx_lock_release+0x10/0x10
      [464.464]  ? __pfx_lock_release+0x10/0x10
      [464.471]  ? btrfs_do_write_iter+0x493/0x640 [btrfs]
      [464.478]  btrfs_writepages+0x1cc/0x460 [btrfs]
      [464.485]  ? __pfx_btrfs_writepages+0x10/0x10 [btrfs]
      [464.493]  ? is_bpf_text_address+0x6e/0x100
      [464.500]  ? kernel_text_address+0x145/0x160
      [464.507]  ? unwind_get_return_address+0x5e/0xa0
      [464.514]  ? arch_stack_walk+0xac/0x100
      [464.521]  do_writepages+0x176/0x780
      [464.527]  ? lock_release+0x567/0x790
      [464.533]  ? __pfx_do_writepages+0x10/0x10
      [464.540]  ? __pfx_lock_acquire+0x10/0x10
      [464.546]  ? __pfx_stack_trace_save+0x10/0x10
      [464.553]  ? do_raw_spin_lock+0x12e/0x270
      [464.560]  ? do_raw_spin_unlock+0x58/0x1f0
      [464.566]  ? _raw_spin_unlock+0x23/0x40
      [464.573]  ? wbc_attach_and_unlock_inode+0x3da/0x7d0
      [464.580]  filemap_fdatawrite_wbc+0x113/0x180
      [464.587]  ? prepare_pages.constprop.0+0x13c/0x5c0 [btrfs]
      [464.596]  __filemap_fdatawrite_range+0xaf/0xf0
      [464.603]  ? __pfx___filemap_fdatawrite_range+0x10/0x10
      [464.611]  ? trace_irq_enable.constprop.0+0xce/0x110
      [464.618]  ? kasan_quarantine_put+0xd7/0x1e0
      [464.625]  btrfs_start_ordered_extent+0x46f/0x570 [btrfs]
      [464.633]  ? __pfx_btrfs_start_ordered_extent+0x10/0x10 [btrfs]
      [464.642]  ? __clear_extent_bit+0x2c0/0x9d0 [btrfs]
      [464.650]  btrfs_lock_and_flush_ordered_range+0xc6/0x180 [btrfs]
      [464.659]  ? __pfx_btrfs_lock_and_flush_ordered_range+0x10/0x10 [btrfs]
      [464.669]  btrfs_read_folio+0x12a/0x1d0 [btrfs]
      [464.676]  ? __pfx_btrfs_read_folio+0x10/0x10 [btrfs]
      [464.684]  ? __pfx_filemap_add_folio+0x10/0x10
      [464.691]  ? __pfx___might_resched+0x10/0x10
      [464.698]  ? __filemap_get_folio+0x1c5/0x450
      [464.705]  prepare_uptodate_page+0x12e/0x4d0 [btrfs]
      [464.713]  prepare_pages.constprop.0+0x13c/0x5c0 [btrfs]
      [464.721]  ? fault_in_iov_iter_readable+0xd2/0x240
      [464.729]  btrfs_buffered_write+0x5bd/0x12f0 [btrfs]
      [464.737]  ? __pfx_btrfs_buffered_write+0x10/0x10 [btrfs]
      [464.745]  ? __pfx_lock_release+0x10/0x10
      [464.752]  ? generic_write_checks+0x275/0x400
      [464.759]  ? down_write+0x118/0x1f0
      [464.765]  ? up_write+0x19b/0x500
      [464.770]  btrfs_direct_write+0x731/0xba0 [btrfs]
      [464.778]  ? __pfx_btrfs_direct_write+0x10/0x10 [btrfs]
      [464.785]  ? __pfx___might_resched+0x10/0x10
      [464.792]  ? lock_acquire+0x435/0x500
      [464.798]  ? lock_acquire+0x435/0x500
      [464.804]  btrfs_do_write_iter+0x494/0x640 [btrfs]
      [464.811]  ? __pfx_btrfs_do_write_iter+0x10/0x10 [btrfs]
      [464.819]  ? __pfx___might_resched+0x10/0x10
      [464.825]  ? rw_verify_area+0x6d/0x590
      [464.831]  vfs_write+0x5d7/0xf50
      [464.837]  ? __might_fault+0x9d/0x120
      [464.843]  ? __pfx_vfs_write+0x10/0x10
      [464.849]  ? btrfs_file_llseek+0xb1/0xfb0 [btrfs]
      [464.856]  ? lock_release+0x567/0x790
      [464.862]  ksys_write+0xfb/0x1d0
      [464.867]  ? __pfx_ksys_write+0x10/0x10
      [464.873]  ? _raw_spin_unlock+0x23/0x40
      [464.879]  ? btrfs_getattr+0x4af/0x670 [btrfs]
      [464.886]  ? vfs_getattr_nosec+0x79/0x340
      [464.892]  do_syscall_64+0x95/0x180
      [464.898]  ? __do_sys_newfstat+0xde/0xf0
      [464.904]  ? __pfx___do_sys_newfstat+0x10/0x10
      [464.911]  ? trace_irq_enable.constprop.0+0xce/0x110
      [464.918]  ? syscall_exit_to_user_mode+0xac/0x2a0
      [464.925]  ? do_syscall_64+0xa1/0x180
      [464.931]  ? trace_irq_enable.constprop.0+0xce/0x110
      [464.939]  ? trace_irq_enable.constprop.0+0xce/0x110
      [464.946]  ? syscall_exit_to_user_mode+0xac/0x2a0
      [464.953]  ? btrfs_file_llseek+0xb1/0xfb0 [btrfs]
      [464.960]  ? do_syscall_64+0xa1/0x180
      [464.966]  ? btrfs_file_llseek+0xb1/0xfb0 [btrfs]
      [464.973]  ? trace_irq_enable.constprop.0+0xce/0x110
      [464.980]  ? syscall_exit_to_user_mode+0xac/0x2a0
      [464.987]  ? __pfx_btrfs_file_llseek+0x10/0x10 [btrfs]
      [464.995]  ? trace_irq_enable.constprop.0+0xce/0x110
      [465.002]  ? __pfx_btrfs_file_llseek+0x10/0x10 [btrfs]
      [465.010]  ? do_syscall_64+0xa1/0x180
      [465.016]  ? lock_release+0x567/0x790
      [465.022]  ? __pfx_lock_acquire+0x10/0x10
      [465.028]  ? __pfx_lock_release+0x10/0x10
      [465.034]  ? trace_irq_enable.constprop.0+0xce/0x110
      [465.042]  ? syscall_exit_to_user_mode+0xac/0x2a0
      [465.049]  ? do_syscall_64+0xa1/0x180
      [465.055]  ? syscall_exit_to_user_mode+0xac/0x2a0
      [465.062]  ? do_syscall_64+0xa1/0x180
      [465.068]  ? syscall_exit_to_user_mode+0xac/0x2a0
      [465.075]  ? do_syscall_64+0xa1/0x180
      [465.081]  ? clear_bhb_loop+0x25/0x80
      [465.087]  ? clear_bhb_loop+0x25/0x80
      [465.093]  ? clear_bhb_loop+0x25/0x80
      [465.099]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
      [465.106] RIP: 0033:0x7f093b8ee784
      [465.111] RSP: 002b:00007ffc29d31b28 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
      [465.122] RAX: ffffffffffffffda RBX: 0000000000006000 RCX: 00007f093b8ee784
      [465.131] RDX: 000000000001de00 RSI: 00007f093b6ed200 RDI: 0000000000000003
      [465.141] RBP: 000000000001de00 R08: 0000000000006000 R09: 0000000000000000
      [465.150] R10: 0000000000023e00 R11: 0000000000000202 R12: 0000000000006000
      [465.160] R13: 0000000000023e00 R14: 0000000000023e00 R15: 0000000000000001
      [465.170]  </TASK>
      [465.174] INFO: lockdep is turned off.
    
    Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
    Fixes: 97713b1a2ced ("btrfs: do not clear page dirty inside extent_write_locked_range()")
    Reviewed-by: Qu Wenruo <wqu@suse.com>
    Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

btrfs: only enable extent map shrinker for DEBUG builds [+ + +]

Author: Qu Wenruo <wqu@suse.com>
Date:   Fri Aug 16 10:40:38 2024 +0930

    btrfs: only enable extent map shrinker for DEBUG builds
    
    commit 534f7eff9239c1b0af852fc33f5af2b62c00eddf upstream.
    
    Although there are several patches improving the extent map shrinker,
    there are still reports of too frequent shrinker behavior, taking too
    much CPU for the kswapd process.
    
    So let's only enable extent shrinker for now, until we got more
    comprehensive understanding and a better solution.
    
    Link: https://lore.kernel.org/linux-btrfs/3df4acd616a07ef4d2dc6bad668701504b412ffc.camel@intelfx.name/
    Link: https://lore.kernel.org/linux-btrfs/c30fd6b3-ca7a-4759-8a53-d42878bf84f7@gmail.com/
    Fixes: 956a17d9d050 ("btrfs: add a shrinker for extent maps")
    CC: stable@vger.kernel.org # 6.10+
    Signed-off-by: Qu Wenruo <wqu@suse.com>
    Reviewed-by: David Sterba <dsterba@suse.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

btrfs: only run the extent map shrinker from kswapd tasks [+ + +]

Author: Filipe Manana <fdmanana@suse.com>
Date:   Sun Aug 11 11:53:42 2024 +0100

    btrfs: only run the extent map shrinker from kswapd tasks
    
    commit ae1e766f623f7a2a889a0b09eb076dd9a60efbe9 upstream.
    
    Currently the extent map shrinker can be run by any task when attempting
    to allocate memory and there's enough memory pressure to trigger it.
    
    To avoid too much latency we stop iterating over extent maps and removing
    them once the task needs to reschedule. This logic was introduced in commit
    b3ebb9b7e92a ("btrfs: stop extent map shrinker if reschedule is needed").
    
    While that solved high latency problems for some use cases, it's still
    not enough because with a too high number of tasks entering the extent map
    shrinker code, either due to memory allocations or because they are a
    kswapd task, we end up having a very high level of contention on some
    spin locks, namely:
    
    1) The fs_info->fs_roots_radix_lock spin lock, which we need to find
       roots to iterate over their inodes;
    
    2) The spin lock of the xarray used to track open inodes for a root
       (struct btrfs_root::inodes) - on 6.10 kernels and below, it used to
       be a red black tree and the spin lock was root->inode_lock;
    
    3) The fs_info->delayed_iput_lock spin lock since the shrinker adds
       delayed iputs (calls btrfs_add_delayed_iput()).
    
    Instead of allowing the extent map shrinker to be run by any task, make
    it run only by kswapd tasks. This still solves the problem of running
    into OOM situations due to an unbounded extent map creation, which is
    simple to trigger by direct IO writes, as described in the changelog
    of commit 956a17d9d050 ("btrfs: add a shrinker for extent maps"), and
    by a similar case when doing buffered IO on files with a very large
    number of holes (keeping the file open and creating many holes, whose
    extent maps are only released when the file is closed).
    
    Reported-by: kzd <kzd@56709.net>
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=219121
    Reported-by: Octavia Togami <octavia.togami@gmail.com>
    Link: https://lore.kernel.org/linux-btrfs/CAHPNGSSt-a4ZZWrtJdVyYnJFscFjP9S7rMcvEMaNSpR556DdLA@mail.gmail.com/
    Fixes: 956a17d9d050 ("btrfs: add a shrinker for extent maps")
    CC: stable@vger.kernel.org # 6.10+
    Tested-by: kzd <kzd@56709.net>
    Tested-by: Octavia Togami <octavia.togami@gmail.com>
    Signed-off-by: Filipe Manana <fdmanana@suse.com>
    Reviewed-by: David Sterba <dsterba@suse.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

btrfs: send: allow cloning non-aligned extent if it ends at i_size [+ + +]

Author: Filipe Manana <fdmanana@suse.com>
Date:   Mon Aug 12 14:18:06 2024 +0100

    btrfs: send: allow cloning non-aligned extent if it ends at i_size
    
    commit 46a6e10a1ab16cc71d4a3cab73e79aabadd6b8ea upstream.
    
    If we a find that an extent is shared but its end offset is not sector
    size aligned, then we don't clone it and issue write operations instead.
    This is because the reflink (remap_file_range) operation does not allow
    to clone unaligned ranges, except if the end offset of the range matches
    the i_size of the source and destination files (and the start offset is
    sector size aligned).
    
    While this is not incorrect because send can only guarantee that a file
    has the same data in the source and destination snapshots, it's not
    optimal and generates confusion and surprising behaviour for users.
    
    For example, running this test:
    
      $ cat test.sh
      #!/bin/bash
    
      DEV=/dev/sdi
      MNT=/mnt/sdi
    
      mkfs.btrfs -f $DEV
      mount $DEV $MNT
    
      # Use a file size not aligned to any possible sector size.
      file_size=$((1 * 1024 * 1024 + 5)) # 1MB + 5 bytes
      dd if=/dev/random of=$MNT/foo bs=$file_size count=1
      cp --reflink=always $MNT/foo $MNT/bar
    
      btrfs subvolume snapshot -r $MNT/ $MNT/snap
      rm -f /tmp/send-test
      btrfs send -f /tmp/send-test $MNT/snap
    
      umount $MNT
      mkfs.btrfs -f $DEV
      mount $DEV $MNT
    
      btrfs receive -vv -f /tmp/send-test $MNT
    
      xfs_io -r -c "fiemap -v" $MNT/snap/bar
    
      umount $MNT
    
    Gives the following result:
    
      (...)
      mkfile o258-7-0
      rename o258-7-0 -> bar
      write bar - offset=0 length=49152
      write bar - offset=49152 length=49152
      write bar - offset=98304 length=49152
      write bar - offset=147456 length=49152
      write bar - offset=196608 length=49152
      write bar - offset=245760 length=49152
      write bar - offset=294912 length=49152
      write bar - offset=344064 length=49152
      write bar - offset=393216 length=49152
      write bar - offset=442368 length=49152
      write bar - offset=491520 length=49152
      write bar - offset=540672 length=49152
      write bar - offset=589824 length=49152
      write bar - offset=638976 length=49152
      write bar - offset=688128 length=49152
      write bar - offset=737280 length=49152
      write bar - offset=786432 length=49152
      write bar - offset=835584 length=49152
      write bar - offset=884736 length=49152
      write bar - offset=933888 length=49152
      write bar - offset=983040 length=49152
      write bar - offset=1032192 length=16389
      chown bar - uid=0, gid=0
      chmod bar - mode=0644
      utimes bar
      utimes
      BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=06d640da-9ca1-604c-b87c-3375175a8eb3, stransid=7
      /mnt/sdi/snap/bar:
       EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
         0: [0..2055]:       26624..28679      2056   0x1
    
    There's no clone operation to clone extents from the file foo into file
    bar and fiemap confirms there's no shared flag (0x2000).
    
    So update send_write_or_clone() so that it proceeds with cloning if the
    source and destination ranges end at the i_size of the respective files.
    
    After this changes the result of the test is:
    
      (...)
      mkfile o258-7-0
      rename o258-7-0 -> bar
      clone bar - source=foo source offset=0 offset=0 length=1048581
      chown bar - uid=0, gid=0
      chmod bar - mode=0644
      utimes bar
      utimes
      BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=582420f3-ea7d-564e-bbe5-ce440d622190, stransid=7
      /mnt/sdi/snap/bar:
       EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
         0: [0..2055]:       26624..28679      2056 0x2001
    
    A test case for fstests will also follow up soon.
    
    Link: https://github.com/kdave/btrfs-progs/issues/572#issuecomment-2282841416
    CC: stable@vger.kernel.org # 5.10+
    Reviewed-by: Qu Wenruo <wqu@suse.com>
    Signed-off-by: Filipe Manana <fdmanana@suse.com>
    Reviewed-by: David Sterba <dsterba@suse.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

btrfs: tree-checker: add dev extent item checks [+ + +]

Author: Qu Wenruo <wqu@suse.com>
Date:   Sun Aug 11 15:00:22 2024 +0930

    btrfs: tree-checker: add dev extent item checks
    
    commit 008e2512dc5696ab2dc5bf264e98a9fe9ceb830e upstream.
    
    [REPORT]
    There is a corruption report that btrfs refused to mount a fs that has
    overlapping dev extents:
    
      BTRFS error (device sdc): dev extent devid 4 physical offset 14263979671552 overlap with previous dev extent end 14263980982272
      BTRFS error (device sdc): failed to verify dev extents against chunks: -117
      BTRFS error (device sdc): open_ctree failed
    
    [CAUSE]
    The direct cause is very obvious, there is a bad dev extent item with
    incorrect length.
    
    With btrfs check reporting two overlapping extents, the second one shows
    some clue on the cause:
    
      ERROR: dev extent devid 4 offset 14263979671552 len 6488064 overlap with previous dev extent end 14263980982272
      ERROR: dev extent devid 13 offset 2257707008000 len 6488064 overlap with previous dev extent end 2257707270144
      ERROR: errors found in extent allocation tree or chunk allocation
    
    The second one looks like a bitflip happened during new chunk
    allocation:
    hex(2257707008000) = 0x20da9d30000
    hex(2257707270144) = 0x20da9d70000
    diff               = 0x00000040000
    
    So it looks like a bitflip happened during new dev extent allocation,
    resulting the second overlap.
    
    Currently we only do the dev-extent verification at mount time, but if the
    corruption is caused by memory bitflip, we really want to catch it before
    writing the corruption to the storage.
    
    Furthermore the dev extent items has the following key definition:
    
            (<device id> DEV_EXTENT <physical offset>)
    
    Thus we can not just rely on the generic key order check to make sure
    there is no overlapping.
    
    [ENHANCEMENT]
    Introduce dedicated dev extent checks, including:
    
    - Fixed member checks
      * chunk_tree should always be BTRFS_CHUNK_TREE_OBJECTID (3)
      * chunk_objectid should always be
        BTRFS_FIRST_CHUNK_CHUNK_TREE_OBJECTID (256)
    
    - Alignment checks
      * chunk_offset should be aligned to sectorsize
      * length should be aligned to sectorsize
      * key.offset should be aligned to sectorsize
    
    - Overlap checks
      If the previous key is also a dev-extent item, with the same
      device id, make sure we do not overlap with the previous dev extent.
    
    Reported: Stefan N <stefannnau@gmail.com>
    Link: https://lore.kernel.org/linux-btrfs/CA+W5K0rSO3koYTo=nzxxTm1-Pdu1HYgVxEpgJ=aGc7d=E8mGEg@mail.gmail.com/
    CC: stable@vger.kernel.org # 5.10+
    Reviewed-by: Anand Jain <anand.jain@oracle.com>
    Signed-off-by: Qu Wenruo <wqu@suse.com>
    Reviewed-by: David Sterba <dsterba@suse.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

btrfs: tree-checker: reject BTRFS_FT_UNKNOWN dir type [+ + +]

Author: Qu Wenruo <wqu@suse.com>
Date:   Mon Aug 12 08:52:44 2024 +0930

    btrfs: tree-checker: reject BTRFS_FT_UNKNOWN dir type
    
    commit 31723c9542dba1681cc3720571fdf12ffe0eddd9 upstream.
    
    [REPORT]
    There is a bug report that kernel is rejecting a mismatching inode mode
    and its dir item:
    
      [ 1881.553937] BTRFS critical (device dm-0): inode mode mismatch with
      dir: inode mode=040700 btrfs type=2 dir type=0
    
    [CAUSE]
    It looks like the inode mode is correct, while the dir item type
    0 is BTRFS_FT_UNKNOWN, which should not be generated by btrfs at all.
    
    This may be caused by a memory bit flip.
    
    [ENHANCEMENT]
    Although tree-checker is not able to do any cross-leaf verification, for
    this particular case we can at least reject any dir type with
    BTRFS_FT_UNKNOWN.
    
    So here we enhance the dir type check from [0, BTRFS_FT_MAX), to
    (0, BTRFS_FT_MAX).
    Although the existing corruption can not be fixed just by such enhanced
    checking, it should prevent the same 0x2->0x0 bitflip for dir type to
    reach disk in the future.
    
    Reported-by: Kota <nospam@kota.moe>
    Link: https://lore.kernel.org/linux-btrfs/CACsxjPYnQF9ZF-0OhH16dAx50=BXXOcP74MxBc3BG+xae4vTTw@mail.gmail.com/
    CC: stable@vger.kernel.org # 5.4+
    Signed-off-by: Qu Wenruo <wqu@suse.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

btrfs: zoned: properly take lock to read/update block group's zoned variables [+ + +]

Author: Naohiro Aota <naohiro.aota@wdc.com>
Date:   Thu Aug 1 16:47:52 2024 +0900

    btrfs: zoned: properly take lock to read/update block group's zoned variables
    
    commit e30729d4bd4001881be4d1ad4332a5d4985398f8 upstream.
    
    __btrfs_add_free_space_zoned() references and modifies bg's alloc_offset,
    ro, and zone_unusable, but without taking the lock. It is mostly safe
    because they monotonically increase (at least for now) and this function is
    mostly called by a transaction commit, which is serialized by itself.
    
    Still, taking the lock is a safer and correct option and I'm going to add a
    change to reset zone_unusable while a block group is still alive. So, add
    locking around the operations.
    
    Fixes: 169e0da91a21 ("btrfs: zoned: track unusable bytes for zones")
    CC: stable@vger.kernel.org # 5.15+
    Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
    Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
    Reviewed-by: David Sterba <dsterba@suse.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

cgroup/cpuset: Clear effective_xcpus on cpus_allowed clearing only if cpus.exclusive not set [+ + +]

Author: Waiman Long <longman@redhat.com>
Date:   Sun Aug 4 21:30:16 2024 -0400

    cgroup/cpuset: Clear effective_xcpus on cpus_allowed clearing only if cpus.exclusive not set
    
    commit 311a1bdc44a8e06024df4fd3392be0dfc8298655 upstream.
    
    Commit e2ffe502ba45 ("cgroup/cpuset: Add cpuset.cpus.exclusive for
    v2") adds a user writable cpuset.cpus.exclusive file for setting
    exclusive CPUs to be used for the creation of partitions. Since then
    effective_xcpus depends on both the cpuset.cpus and cpuset.cpus.exclusive
    setting. If cpuset.cpus.exclusive is set, effective_xcpus will depend
    only on cpuset.cpus.exclusive.  When it is not set, effective_xcpus
    will be set according to the cpuset.cpus value when the cpuset becomes
    a valid partition root.
    
    When cpuset.cpus is being cleared by the user, effective_xcpus should
    only be cleared when cpuset.cpus.exclusive is not set. However, that
    is not currently the case.
    
      # cd /sys/fs/cgroup/
      # mkdir test
      # echo +cpuset > cgroup.subtree_control
      # cd test
      # echo 3 > cpuset.cpus.exclusive
      # cat cpuset.cpus.exclusive.effective
      3
      # echo > cpuset.cpus
      # cat cpuset.cpus.exclusive.effective // was cleared
    
    Fix it by clearing effective_xcpus only if cpuset.cpus.exclusive is
    not set.
    
    Fixes: e2ffe502ba45 ("cgroup/cpuset: Add cpuset.cpus.exclusive for v2")
    Cc: stable@vger.kernel.org # v6.7+
    Reported-by: Chen Ridong <chenridong@huawei.com>
    Signed-off-by: Waiman Long <longman@redhat.com>
    Signed-off-by: Tejun Heo <tj@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

cgroup/cpuset: fix panic caused by partcmd_update [+ + +]

Author: Chen Ridong <chenridong@huawei.com>
Date:   Sun Aug 4 21:30:15 2024 -0400

    cgroup/cpuset: fix panic caused by partcmd_update
    
    commit 959ab6350add903e352890af53e86663739fcb9a upstream.
    
    We find a bug as below:
    BUG: unable to handle page fault for address: 00000003
    PGD 0 P4D 0
    Oops: 0000 [#1] PREEMPT SMP NOPTI
    CPU: 3 PID: 358 Comm: bash Tainted: G        W I        6.6.0-10893-g60d6
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/4
    RIP: 0010:partition_sched_domains_locked+0x483/0x600
    Code: 01 48 85 d2 74 0d 48 83 05 29 3f f8 03 01 f3 48 0f bc c2 89 c0 48 9
    RSP: 0018:ffffc90000fdbc58 EFLAGS: 00000202
    RAX: 0000000100000003 RBX: ffff888100b3dfa0 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000002fe80
    RBP: ffff888100b3dfb0 R08: 0000000000000001 R09: 0000000000000000
    R10: ffffc90000fdbcb0 R11: 0000000000000004 R12: 0000000000000002
    R13: ffff888100a92b48 R14: 0000000000000000 R15: 0000000000000000
    FS:  00007f44a5425740(0000) GS:ffff888237d80000(0000) knlGS:0000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000100030973 CR3: 000000010722c000 CR4: 00000000000006e0
    Call Trace:
     <TASK>
     ? show_regs+0x8c/0xa0
     ? __die_body+0x23/0xa0
     ? __die+0x3a/0x50
     ? page_fault_oops+0x1d2/0x5c0
     ? partition_sched_domains_locked+0x483/0x600
     ? search_module_extables+0x2a/0xb0
     ? search_exception_tables+0x67/0x90
     ? kernelmode_fixup_or_oops+0x144/0x1b0
     ? __bad_area_nosemaphore+0x211/0x360
     ? up_read+0x3b/0x50
     ? bad_area_nosemaphore+0x1a/0x30
     ? exc_page_fault+0x890/0xd90
     ? __lock_acquire.constprop.0+0x24f/0x8d0
     ? __lock_acquire.constprop.0+0x24f/0x8d0
     ? asm_exc_page_fault+0x26/0x30
     ? partition_sched_domains_locked+0x483/0x600
     ? partition_sched_domains_locked+0xf0/0x600
     rebuild_sched_domains_locked+0x806/0xdc0
     update_partition_sd_lb+0x118/0x130
     cpuset_write_resmask+0xffc/0x1420
     cgroup_file_write+0xb2/0x290
     kernfs_fop_write_iter+0x194/0x290
     new_sync_write+0xeb/0x160
     vfs_write+0x16f/0x1d0
     ksys_write+0x81/0x180
     __x64_sys_write+0x21/0x30
     x64_sys_call+0x2f25/0x4630
     do_syscall_64+0x44/0xb0
     entry_SYSCALL_64_after_hwframe+0x78/0xe2
    RIP: 0033:0x7f44a553c887
    
    It can be reproduced with cammands:
    cd /sys/fs/cgroup/
    mkdir test
    cd test/
    echo +cpuset > ../cgroup.subtree_control
    echo root > cpuset.cpus.partition
    cat /sys/fs/cgroup/cpuset.cpus.effective
    0-3
    echo 0-3 > cpuset.cpus // taking away all cpus from root
    
    This issue is caused by the incorrect rebuilding of scheduling domains.
    In this scenario, test/cpuset.cpus.partition should be an invalid root
    and should not trigger the rebuilding of scheduling domains. When calling
    update_parent_effective_cpumask with partcmd_update, if newmask is not
    null, it should recheck newmask whether there are cpus is available
    for parect/cs that has tasks.
    
    Fixes: 0c7f293efc87 ("cgroup/cpuset: Add cpuset.cpus.exclusive.effective for v2")
    Cc: stable@vger.kernel.org # v6.7+
    Signed-off-by: Chen Ridong <chenridong@huawei.com>
    Signed-off-by: Waiman Long <longman@redhat.com>
    Signed-off-by: Tejun Heo <tj@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

char: xillybus: Check USB endpoints when probing device [+ + +]

Author: Eli Billauer <eli.billauer@gmail.com>
Date:   Fri Aug 16 10:02:00 2024 +0300

    char: xillybus: Check USB endpoints when probing device
    
    commit 2374bf7558de915edc6ec8cb10ec3291dfab9594 upstream.
    
    Ensure, as the driver probes the device, that all endpoints that the
    driver may attempt to access exist and are of the correct type.
    
    All XillyUSB devices must have a Bulk IN and Bulk OUT endpoint at
    address 1. This is verified in xillyusb_setup_base_eps().
    
    On top of that, a XillyUSB device may have additional Bulk OUT
    endpoints. The information about these endpoints' addresses is deduced
    from a data structure (the IDT) that the driver fetches from the device
    while probing it. These endpoints are checked in setup_channels().
    
    A XillyUSB device never has more than one IN endpoint, as all data
    towards the host is multiplexed in this single Bulk IN endpoint. This is
    why setup_channels() only checks OUT endpoints.
    
    Reported-by: syzbot+eac39cba052f2e750dbe@syzkaller.appspotmail.com
    Cc: stable <stable@kernel.org>
    Closes: https://lore.kernel.org/all/0000000000001d44a6061f7a54ee@google.com/T/
    Fixes: a53d1202aef1 ("char: xillybus: Add driver for XillyUSB (Xillybus variant for USB)").
    Signed-off-by: Eli Billauer <eli.billauer@gmail.com>
    Link: https://lore.kernel.org/r/20240816070200.50695-2-eli.billauer@gmail.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

char: xillybus: Don't destroy workqueue from work item running on it [+ + +]

Author: Eli Billauer <eli.billauer@gmail.com>
Date:   Thu Aug 1 15:11:26 2024 +0300

    char: xillybus: Don't destroy workqueue from work item running on it
    
    commit ccbde4b128ef9c73d14d0d7817d68ef795f6d131 upstream.
    
    Triggered by a kref decrement, destroy_workqueue() may be called from
    within a work item for destroying its own workqueue. This illegal
    situation is averted by adding a module-global workqueue for exclusive
    use of the offending work item. Other work items continue to be queued
    on per-device workqueues to ensure performance.
    
    Reported-by: syzbot+91dbdfecdd3287734d8e@syzkaller.appspotmail.com
    Cc: stable <stable@kernel.org>
    Closes: https://lore.kernel.org/lkml/0000000000000ab25a061e1dfe9f@google.com/
    Signed-off-by: Eli Billauer <eli.billauer@gmail.com>
    Link: https://lore.kernel.org/r/20240801121126.60183-1-eli.billauer@gmail.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

char: xillybus: Refine workqueue handling [+ + +]

Author: Eli Billauer <eli.billauer@gmail.com>
Date:   Fri Aug 16 10:01:59 2024 +0300

    char: xillybus: Refine workqueue handling
    
    commit ad899c301c880766cc709aad277991b3ab671b66 upstream.
    
    As the wakeup work item now runs on a separate workqueue, it needs to be
    flushed separately along with flushing the device's workqueue.
    
    Also, move the destroy_workqueue() call to the end of the exit method,
    so that deinitialization is done in the opposite order of
    initialization.
    
    Fixes: ccbde4b128ef ("char: xillybus: Don't destroy workqueue from work item running on it")
    Cc: stable <stable@kernel.org>
    Signed-off-by: Eli Billauer <eli.billauer@gmail.com>
    Link: https://lore.kernel.org/r/20240816070200.50695-1-eli.billauer@gmail.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

cifs: Add a tracepoint to track credits involved in R/W requests [+ + +]

Author: David Howells <dhowells@redhat.com>
Date:   Thu May 23 10:01:08 2024 +0100

    cifs: Add a tracepoint to track credits involved in R/W requests
    
    [ Upstream commit 519be989717c5bffaed1dc14a439e3872cb4bb8d ]
    
    Add a tracepoint to track the credit changes and server in_flight value
    involved in the lifetime of a R/W request, logging it against the
    request/subreq debugging ID.  This requires the debugging IDs to be
    recorded in the cifs_credits struct.
    
    The tracepoint can be enabled with:
    
            echo 1 >/sys/kernel/debug/tracing/events/cifs/smb3_rw_credits/enable
    
    Also add a three-state flag to struct cifs_credits to note if we're
    interested in determining when the in_flight contribution ends and, if so,
    to track whether we've decremented the contribution yet.
    
    Signed-off-by: David Howells <dhowells@redhat.com>
    Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
    cc: Jeff Layton <jlayton@kernel.org>
    cc: linux-cifs@vger.kernel.org
    cc: netfs@lists.linux.dev
    cc: linux-fsdevel@vger.kernel.org
    Signed-off-by: Steve French <stfrench@microsoft.com>
    Stable-dep-of: 74c2ab6d653b ("smb/client: avoid possible NULL dereference in cifs_free_subrequest()")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

cpu/SMT: Enable SMT only if a core is online [+ + +]

Author: Nysal Jan K.A <nysal@linux.ibm.com>
Date:   Wed Jul 31 08:31:12 2024 +0530

    cpu/SMT: Enable SMT only if a core is online
    
    [ Upstream commit 6c17ea1f3eaa330d445ac14a9428402ce4e3055e ]
    
    If a core is offline then enabling SMT should not online CPUs of
    this core. By enabling SMT, what is intended is either changing the SMT
    value from "off" to "on" or setting the SMT level (threads per core) from a
    lower to higher value.
    
    On PowerPC the ppc64_cpu utility can be used, among other things, to
    perform the following functions:
    
    ppc64_cpu --cores-on                # Get the number of online cores
    ppc64_cpu --cores-on=X              # Put exactly X cores online
    ppc64_cpu --offline-cores=X[,Y,...] # Put specified cores offline
    ppc64_cpu --smt={on|off|value}      # Enable, disable or change SMT level
    
    If the user has decided to offline certain cores, enabling SMT should
    not online CPUs in those cores. This patch fixes the issue and changes
    the behaviour as described, by introducing an arch specific function
    topology_is_core_online(). It is currently implemented only for PowerPC.
    
    Fixes: 73c58e7e1412 ("powerpc: Add HOTPLUG_SMT support")
    Reported-by: Tyrel Datwyler <tyreld@linux.ibm.com>
    Closes: https://groups.google.com/g/powerpc-utils-devel/c/wrwVzAAnRlI/m/5KJSoqP4BAAJ
    Signed-off-by: Nysal Jan K.A <nysal@linux.ibm.com>
    Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>
    Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://msgid.link/20240731030126.956210-2-nysal@linux.ibm.com
    Signed-off-by: Sasha Levin <sashal@kernel.org>

cxgb4: add forgotten u64 ivlan cast before shift [+ + +]

Author: Nikolay Kuratov <kniv@yandex-team.ru>
Date:   Mon Aug 19 10:54:08 2024 +0300

    cxgb4: add forgotten u64 ivlan cast before shift
    
    commit 80a1e7b83bb1834b5568a3872e64c05795d88f31 upstream.
    
    It is done everywhere in cxgb4 code, e.g. in is_filter_exact_match()
    There is no reason it should not be done here
    
    Found by Linux Verification Center (linuxtesting.org) with SVACE
    
    Signed-off-by: Nikolay Kuratov <kniv@yandex-team.ru>
    Cc: stable@vger.kernel.org
    Fixes: 12b276fbf6e0 ("cxgb4: add support to create hash filters")
    Reviewed-by: Simon Horman <horms@kernel.org>
    Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
    Link: https://patch.msgid.link/20240819075408.92378-1-kniv@yandex-team.ru
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

dm persistent data: fix memory allocation failure [+ + +]

Author: Mikulas Patocka <mpatocka@redhat.com>
Date:   Tue Aug 13 16:35:14 2024 +0200

    dm persistent data: fix memory allocation failure
    
    commit faada2174c08662ae98b439c69efe3e79382c538 upstream.
    
    kmalloc is unreliable when allocating more than 8 pages of memory. It may
    fail when there is plenty of free memory but the memory is fragmented.
    Zdenek Kabelac observed such failure in his tests.
    
    This commit changes kmalloc to kvmalloc - kvmalloc will fall back to
    vmalloc if the large allocation fails.
    
    Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
    Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
    Reviewed-by: Mike Snitzer <snitzer@kernel.org>
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

dm resume: don't return EINVAL when signalled [+ + +]

Author: Khazhismel Kumykov <khazhy@google.com>
Date:   Tue Aug 13 12:39:52 2024 +0200

    dm resume: don't return EINVAL when signalled
    
    commit 7a636b4f03af9d541205f69e373672e7b2b60a8a upstream.
    
    If the dm_resume method is called on a device that is not suspended, the
    method will suspend the device briefly, before resuming it (so that the
    table will be swapped).
    
    However, there was a bug that the return value of dm_suspended_md was not
    checked. dm_suspended_md may return an error when it is interrupted by a
    signal. In this case, do_resume would call dm_swap_table, which would
    return -EINVAL.
    
    This commit fixes the logic, so that error returned by dm_suspend is
    checked and the resume operation is undone.
    
    Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
    Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

dm suspend: return -ERESTARTSYS instead of -EINTR [+ + +]

Author: Mikulas Patocka <mpatocka@redhat.com>
Date:   Tue Aug 13 12:38:51 2024 +0200

    dm suspend: return -ERESTARTSYS instead of -EINTR
    
    [ Upstream commit 1e1fd567d32fcf7544c6e09e0e5bc6c650da6e23 ]
    
    This commit changes device mapper, so that it returns -ERESTARTSYS
    instead of -EINTR when it is interrupted by a signal (so that the ioctl
    can be restarted).
    
    The manpage signal(7) says that the ioctl function should be restarted if
    the signal was handled with SA_RESTART.
    
    Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: Sasha Levin <sashal@kernel.org>

dpaa2-switch: Fix error checking in dpaa2_switch_seed_bp() [+ + +]

Author: Dan Carpenter <dan.carpenter@linaro.org>
Date:   Sat Aug 17 09:52:46 2024 +0300

    dpaa2-switch: Fix error checking in dpaa2_switch_seed_bp()
    
    [ Upstream commit c50e7475961c36ec4d21d60af055b32f9436b431 ]
    
    The dpaa2_switch_add_bufs() function returns the number of bufs that it
    was able to add.  It returns BUFS_PER_CMD (7) for complete success or a
    smaller number if there are not enough pages available.  However, the
    error checking is looking at the total number of bufs instead of the
    number which were added on this iteration.  Thus the error checking
    only works correctly for the first iteration through the loop and
    subsequent iterations are always counted as a success.
    
    Fix this by checking only the bufs added in the current iteration.
    
    Fixes: 0b1b71370458 ("staging: dpaa2-switch: handle Rx path on control interface")
    Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
    Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com>
    Link: https://patch.msgid.link/eec27f30-b43f-42b6-b8ee-04a6f83423b6@stanley.mountain
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/amd/amdgpu: command submission parser for JPEG [+ + +]

Author: David (Ming Qiang) Wu <David.Wu3@amd.com>
Date:   Thu Aug 8 12:19:50 2024 -0400

    drm/amd/amdgpu: command submission parser for JPEG
    
    commit 470516c2925493594a690bc4d05b1f4471d9f996 upstream.
    
    Add JPEG IB command parser to ensure registers
    in the command are within the JPEG IP block.
    
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    (cherry picked from commit a7f670d5d8e77b092404ca8a35bb0f8f89ed3117)
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/amd/display: Adjust cursor position [+ + +]

Author: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Date:   Thu Aug 1 16:16:35 2024 -0600

    drm/amd/display: Adjust cursor position
    
    commit 56fb276d0244d430496f249335a44ae114dd5f54 upstream.
    
    [why & how]
    When the commit 9d84c7ef8a87 ("drm/amd/display: Correct cursor position
    on horizontal mirror") was introduced, it used the wrong calculation for
    the position copy for X. This commit uses the correct calculation for that
    based on the original patch.
    
    Fixes: 9d84c7ef8a87 ("drm/amd/display: Correct cursor position on horizontal mirror")
    Cc: Mario Limonciello <mario.limonciello@amd.com>
    Cc: Alex Deucher <alexander.deucher@amd.com>
    Acked-by: Wayne Lin <wayne.lin@amd.com>
    Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
    Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
    Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    (cherry picked from commit 8f9b23abbae5ffcd64856facd26a86b67195bc2f)
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/amd/display: Don't register panel_power_savings on OLED panels [+ + +]

Author: Mario Limonciello <mario.limonciello@amd.com>
Date:   Thu May 9 12:05:24 2024 -0500

    drm/amd/display: Don't register panel_power_savings on OLED panels
    
    commit 76cb763e6ea62e838ccc8f7a1ea4246d690fccc9 upstream.
    
    OLED panels don't support the ABM, they shouldn't offer the
    panel_power_savings attribute to the user. Check whether aux BL
    control support was enabled to decide whether to offer it.
    
    Reported-by: Gergo Koteles <soyer@irl.hu>
    Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3359
    Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
    Reviewed-by: Harry Wentland <harry.wentland@amd.com>
    Tested-by: Gergo Koteles <soyer@irl.hu>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/amd/display: Enable otg synchronization logic for DCN321 [+ + +]

Author: Loan Chen <lo-an.chen@amd.com>
Date:   Fri Aug 2 13:57:40 2024 +0800

    drm/amd/display: Enable otg synchronization logic for DCN321
    
    commit 0dbb81d44108a2a1004e5b485ef3fca5bc078424 upstream.
    
    [Why]
    Tiled display cannot synchronize properly after S3.
    The fix for commit 5f0c74915815 ("drm/amd/display: Fix for otg
    synchronization logic") is not enable in DCN321, which causes
    the otg is excluded from synchronization.
    
    [How]
    Enable otg synchronization logic in dcn321.
    
    Fixes: 5f0c74915815 ("drm/amd/display: Fix for otg synchronization logic")
    Cc: Mario Limonciello <mario.limonciello@amd.com>
    Cc: Alex Deucher <alexander.deucher@amd.com>
    Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
    Signed-off-by: Loan Chen <lo-an.chen@amd.com>
    Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
    Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    (cherry picked from commit d6ed53712f583423db61fbb802606759e023bf7b)
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/amd/display: fix cursor offset on rotation 180 [+ + +]

Author: Melissa Wen <mwen@igalia.com>
Date:   Tue Jan 31 15:05:46 2023 -0100

    drm/amd/display: fix cursor offset on rotation 180
    
    commit 737222cebecbdbcdde2b69475c52bcb9ecfeb830 upstream.
    
    [why & how]
    Cursor gets clipped off in the middle of the screen with hw
    rotation 180. Fix a miscalculation of cursor offset when it's
    placed near the edges in the pipe split case.
    
    Cursor bugs with hw rotation were reported on AMD issue
    tracker:
    https://gitlab.freedesktop.org/drm/amd/-/issues/2247
    
    The issues on rotation 270 was fixed by:
    https://lore.kernel.org/amd-gfx/20221118125935.4013669-22-Brian.Chang@amd.com/
    that partially addressed the rotation 180 too. So, this patch is the
    final bits for rotation 180.
    
    Reported-by: Xaver Hugl <xaver.hugl@gmail.com>
    Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2247
    Reviewed-by: Harry Wentland <harry.wentland@amd.com>
    Fixes: 9d84c7ef8a87 ("drm/amd/display: Correct cursor position on horizontal mirror")
    Signed-off-by: Melissa Wen <mwen@igalia.com>
    Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
    Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
    Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    (cherry picked from commit 1fd2cf090096af8a25bf85564341cfc21cec659d)
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/amd/display: fix s2idle entry for DCN3.5+ [+ + +]

Author: Hamza Mahfooz <hamza.mahfooz@amd.com>
Date:   Tue Aug 6 09:55:55 2024 -0400

    drm/amd/display: fix s2idle entry for DCN3.5+
    
    commit f6098641d3e1e4d4052ff9378857c831f9675f6b upstream.
    
    To be able to get to the lowest power state when suspending systems with
    DCN3.5+, we must be in IPS before the display hardware is put into
    D3cold. So, to ensure that the system always reaches the lowest power
    state while suspending, force systems that support IPS to enter idle
    optimizations before entering D3cold.
    
    Reviewed-by: Roman Li <roman.li@amd.com>
    Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    (cherry picked from commit 237193e21b29d4aa0617ffeea3d6f49e72999708)
    Cc: stable@vger.kernel.org # 6.10+
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/amdgpu/jpeg2: properly set atomics vmid field [+ + +]

Author: Alex Deucher <alexander.deucher@amd.com>
Date:   Fri Jul 12 10:00:33 2024 -0400

    drm/amdgpu/jpeg2: properly set atomics vmid field
    
    commit e414a304f2c5368a84f03ad34d29b89f965a33c9 upstream.
    
    This needs to be set as well if the IB uses atomics.
    
    Reviewed-by: Leo Liu <leo.liu@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    (cherry picked from commit 35c628774e50b3784c59e8ca7973f03bcb067132)
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/amdgpu/jpeg4: properly set atomics vmid field [+ + +]

Author: Alex Deucher <alexander.deucher@amd.com>
Date:   Fri Jul 12 10:06:05 2024 -0400

    drm/amdgpu/jpeg4: properly set atomics vmid field
    
    commit e6c6bd6253e792cee6c5c065e106e87b9f0d9ae9 upstream.
    
    This needs to be set as well if the IB uses atomics.
    
    Reviewed-by: Leo Liu <leo.liu@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    (cherry picked from commit c6c2e8b6a427d4fecc7c36cffccb908185afcab2)
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/amdgpu/sdma5.2: limit wptr workaround to sdma 5.2.1 [+ + +]

Author: Alex Deucher <alexander.deucher@amd.com>
Date:   Wed Aug 14 10:28:24 2024 -0400

    drm/amdgpu/sdma5.2: limit wptr workaround to sdma 5.2.1
    
    commit e3e4bf58bad1576ac732a1429f53e3d4bfb82b4b upstream.
    
    The workaround seems to cause stability issues on other
    SDMA 5.2.x IPs.
    
    Fixes: a03ebf116303 ("drm/amdgpu/sdma5.2: Update wptr registers as well as doorbell")
    Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3556
    Acked-by: Ruijing Dong <ruijing.dong@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    (cherry picked from commit 2dc3851ef7d9c5439ea8e9623fc36878f3b40649)
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/amdgpu/vcn: identify unified queue in sw init [+ + +]

Author: Boyuan Zhang <boyuan.zhang@amd.com>
Date:   Thu Jul 11 16:19:54 2024 -0400

    drm/amdgpu/vcn: identify unified queue in sw init
    
    commit ecfa23c8df7ef3ea2a429dfe039341bf792e95b4 upstream.
    
    Determine whether VCN using unified queue in sw_init, instead of calling
    functions later on.
    
    v2: fix coding style
    
    Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
    Acked-by: Alex Deucher <alexander.deucher@amd.com>
    Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Cc: Mario Limonciello <mario.limonciello@amd.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/amdgpu/vcn: not pause dpg for unified queue [+ + +]

Author: Boyuan Zhang <boyuan.zhang@amd.com>
Date:   Wed Jul 10 16:17:12 2024 -0400

    drm/amdgpu/vcn: not pause dpg for unified queue
    
    commit 7d75ef3736a025db441be652c8cc8e84044a215f upstream.
    
    For unified queue, DPG pause for encoding is done inside VCN firmware,
    so there is no need to pause dpg based on ring type in kernel.
    
    For VCN3 and below, pausing DPG for encoding in kernel is still needed.
    
    v2: add more comments
    v3: update commit message
    
    Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
    Acked-by: Alex Deucher <alexander.deucher@amd.com>
    Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Cc: Mario Limonciello <mario.limonciello@amd.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/amdgpu: Actually check flags for all context ops. [+ + +]

Author: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Date:   Tue Aug 6 22:27:32 2024 +0200

    drm/amdgpu: Actually check flags for all context ops.
    
    commit 0573a1e2ea7e35bff08944a40f1adf2bb35cea61 upstream.
    
    Missing validation ...
    
    Checked libdrm and it clears all the structs, so we should be
    safe to just check everything.
    
    Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    (cherry picked from commit c6b86421f1f9ddf9d706f2453159813ee39d0cf9)
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/amdgpu: Validate TA binary size [+ + +]

Author: Candice Li <candice.li@amd.com>
Date:   Thu Aug 15 11:37:28 2024 +0800

    drm/amdgpu: Validate TA binary size
    
    commit c99769bceab4ecb6a067b9af11f9db281eea3e2a upstream.
    
    Add TA binary size validation to avoid OOB write.
    
    Signed-off-by: Candice Li <candice.li@amd.com>
    Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    (cherry picked from commit c0a04e3570d72aaf090962156ad085e37c62e442)
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/i915/hdcp: Use correct cp_irq_count [+ + +]

Author: Suraj Kandpal <suraj.kandpal@intel.com>
Date:   Fri Aug 9 17:11:28 2024 +0530

    drm/i915/hdcp: Use correct cp_irq_count
    
    [ Upstream commit 5d41eeb6725e3e24853629e5d7635e4bc45d736e ]
    
    We are checking cp_irq_count from the wrong hdcp structure which
    ends up giving timed out errors. We only increment the cp_irq_count
    of the primary connector's hdcp structure but here in case of
    multidisplay setup we end up checking the secondary connector's hdcp
    structure, which will not have its cp_irq_count incremented. This leads
    to a timed out at CP_IRQ error even though a CP_IRQ was raised. Extract
    it from the correct intel_hdcp structure.
    
    --v2
    -Explain why it was the wrong hdcp structure [Jani]
    
    Fixes: 8c9e4f68b861 ("drm/i915/hdcp: Use per-device debugs")
    Signed-off-by: Suraj Kandpal <suraj.kandpal@intel.com>
    Reviewed-by: Ankit Nautiyal <ankit.k.nautiyal@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20240809114127.3940699-2-suraj.kandpal@intel.com
    (cherry picked from commit dd925902634def895690426bf10e0a8b3e56f56d)
    Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/msm/dp: fix the max supported bpp logic [+ + +]

Author: Abhinav Kumar <quic_abhinavk@quicinc.com>
Date:   Mon Aug 5 13:20:08 2024 -0700

    drm/msm/dp: fix the max supported bpp logic
    
    [ Upstream commit d19d5b8d8f6dab942ce5ddbcf34bf7275e778250 ]
    
    Fix the dp_panel_get_supported_bpp() API to return the minimum
    supported bpp correctly for relevant cases and use this API
    to correct the behavior of DP driver which hard-codes the max supported
    bpp to 30.
    
    This is incorrect because the number of lanes and max data rate
    supported by the lanes need to be taken into account.
    
    Replace the hardcoded limit with the appropriate math which accounts
    for the accurate number of lanes and max data rate.
    
    changes in v2:
            - Fix the dp_panel_get_supported_bpp() and use it
            - Drop the max_t usage as dp_panel_get_supported_bpp() already
              returns the min_bpp correctly now
    
    changes in v3:
            - replace min_t with just min as all params are u32
    
    Fixes: c943b4948b58 ("drm/msm/dp: add displayPort driver support")
    Reported-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
    Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/43
    Tested-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> # SM8350-HDK
    Reviewed-by: Stephen Boyd <swboyd@chromium.org>
    Patchwork: https://patchwork.freedesktop.org/patch/607073/
    Link: https://lore.kernel.org/r/20240805202009.1120981-1-quic_abhinavk@quicinc.com
    Signed-off-by: Stephen Boyd <swboyd@chromium.org>
    Signed-off-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/msm/dp: reset the link phy params before link training [+ + +]

Author: Abhinav Kumar <quic_abhinavk@quicinc.com>
Date:   Thu Jul 25 15:04:50 2024 -0700

    drm/msm/dp: reset the link phy params before link training
    
    [ Upstream commit 319aca883bfa1b85ee08411541b51b9a934ac858 ]
    
    Before re-starting link training reset the link phy params namely
    the pre-emphasis and voltage swing levels otherwise the next
    link training begins at the previously cached levels which can result
    in link training failures.
    
    Fixes: 8ede2ecc3e5e ("drm/msm/dp: Add DP compliance tests on Snapdragon Chipsets")
    Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
    Tested-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> # SM8350-HDK
    Reviewed-by: Stephen Boyd <swboyd@chromium.org>
    Patchwork: https://patchwork.freedesktop.org/patch/605946/
    Link: https://lore.kernel.org/r/20240725220450.131245-1-quic_abhinavk@quicinc.com
    Signed-off-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/msm/dpu: cleanup FB if dpu_format_populate_layout fails [+ + +]

Author: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Date:   Tue Jun 25 00:13:41 2024 +0300

    drm/msm/dpu: cleanup FB if dpu_format_populate_layout fails
    
    [ Upstream commit bfa1a6283be390947d3649c482e5167186a37016 ]
    
    If the dpu_format_populate_layout() fails, then FB is prepared, but not
    cleaned up. This ends up leaking the pin_count on the GEM object and
    causes a splat during DRM file closure:
    
    msm_obj->pin_count
    WARNING: CPU: 2 PID: 569 at drivers/gpu/drm/msm/msm_gem.c:121 update_lru_locked+0xc4/0xcc
    [...]
    Call trace:
     update_lru_locked+0xc4/0xcc
     put_pages+0xac/0x100
     msm_gem_free_object+0x138/0x180
     drm_gem_object_free+0x1c/0x30
     drm_gem_object_handle_put_unlocked+0x108/0x10c
     drm_gem_object_release_handle+0x58/0x70
     idr_for_each+0x68/0xec
     drm_gem_release+0x28/0x40
     drm_file_free+0x174/0x234
     drm_release+0xb0/0x160
     __fput+0xc0/0x2c8
     __fput_sync+0x50/0x5c
     __arm64_sys_close+0x38/0x7c
     invoke_syscall+0x48/0x118
     el0_svc_common.constprop.0+0x40/0xe0
     do_el0_svc+0x1c/0x28
     el0_svc+0x4c/0x120
     el0t_64_sync_handler+0x100/0x12c
     el0t_64_sync+0x190/0x194
    irq event stamp: 129818
    hardirqs last  enabled at (129817): [<ffffa5f6d953fcc0>] console_unlock+0x118/0x124
    hardirqs last disabled at (129818): [<ffffa5f6da7dcf04>] el1_dbg+0x24/0x8c
    softirqs last  enabled at (129808): [<ffffa5f6d94afc18>] handle_softirqs+0x4c8/0x4e8
    softirqs last disabled at (129785): [<ffffa5f6d94105e4>] __do_softirq+0x14/0x20
    
    Fixes: 25fdd5933e4c ("drm/msm: Add SDM845 DPU support")
    Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
    Reviewed-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
    Patchwork: https://patchwork.freedesktop.org/patch/600714/
    Link: https://lore.kernel.org/r/20240625-dpu-mode-config-width-v5-1-501d984d634f@linaro.org
    Signed-off-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/msm/dpu: don't play tricks with debug macros [+ + +]

Author: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Date:   Fri Aug 2 22:47:34 2024 +0300

    drm/msm/dpu: don't play tricks with debug macros
    
    [ Upstream commit df24373435f5899a2a98b7d377479c8d4376613b ]
    
    DPU debugging macros need to be converted to a proper drm_debug_*
    macros, however this is a going an intrusive patch, not suitable for a
    fix. Wire DPU_DEBUG and DPU_DEBUG_DRIVER to always use DRM_DEBUG_DRIVER
    to make sure that DPU debugging messages always end up in the drm debug
    messages and are controlled via the usual drm.debug mask.
    
    I don't think that it is a good idea for a generic DPU_DEBUG macro to be
    tied to DRM_UT_KMS. It is used to report a debug message from driver, so by
    default it should go to the DRM_UT_DRIVER channel. While refactoring
    debug macros later on we might end up with particular messages going to
    ATOMIC or KMS, but DRIVER should be the default.
    
    Fixes: 25fdd5933e4c ("drm/msm: Add SDM845 DPU support")
    Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
    Reviewed-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
    Patchwork: https://patchwork.freedesktop.org/patch/606932/
    Link: https://lore.kernel.org/r/20240802-dpu-fix-wb-v2-2-7eac9eb8e895@linaro.org
    Signed-off-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/msm/dpu: limit QCM2290 to RGB formats only [+ + +]

Author: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Date:   Thu Jun 27 00:45:55 2024 +0300

    drm/msm/dpu: limit QCM2290 to RGB formats only
    
    [ Upstream commit 2db13c4a631505029ada9404e09a2b06a268c1c4 ]
    
    The QCM2290 doesn't have CSC blocks, so it can not support YUV formats
    even on ViG blocks. Fix the formats declared by _VIG_SBLK_NOSCALE().
    
    Fixes: 5334087ee743 ("drm/msm: add support for QCM2290 MDSS")
    Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
    Reviewed-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
    Patchwork: https://patchwork.freedesktop.org/patch/601048/
    Link: https://lore.kernel.org/r/20240627-dpu-virtual-wide-v5-1-5efb90cbb8be@linaro.org
    Signed-off-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/msm/dpu: move dpu_encoder's connector assignment to atomic_enable() [+ + +]

Author: Abhinav Kumar <quic_abhinavk@quicinc.com>
Date:   Wed Jul 31 12:17:22 2024 -0700

    drm/msm/dpu: move dpu_encoder's connector assignment to atomic_enable()
    
    [ Upstream commit aedf02e46eb549dac8db4821a6b9f0c6bf6e3990 ]
    
    For cases where the crtc's connectors_changed was set without enable/active
    getting toggled , there is an atomic_enable() call followed by an
    atomic_disable() but without an atomic_mode_set().
    
    This results in a NULL ptr access for the dpu_encoder_get_drm_fmt() call in
    the atomic_enable() as the dpu_encoder's connector was cleared in the
    atomic_disable() but not re-assigned as there was no atomic_mode_set() call.
    
    Fix the NULL ptr access by moving the assignment for atomic_enable() and also
    use drm_atomic_get_new_connector_for_encoder() to get the connector from
    the atomic_state.
    
    Fixes: 25fdd5933e4c ("drm/msm: Add SDM845 DPU support")
    Reported-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
    Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/59
    Suggested-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
    Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
    Tested-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> # SM8350-HDK
    Patchwork: https://patchwork.freedesktop.org/patch/606729/
    Link: https://lore.kernel.org/r/20240731191723.3050932-1-quic_abhinavk@quicinc.com
    Signed-off-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/msm/dpu: relax YUV requirements [+ + +]

Author: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Date:   Thu Jun 27 00:45:56 2024 +0300

    drm/msm/dpu: relax YUV requirements
    
    [ Upstream commit cb18195914e353ece0e789e365a5a16872169805 ]
    
    YUV formats require only CSC to be enabled. Even decimated formats
    should not require scaler. Relax the requirement and don't check for the
    scaler block while checking if YUV format can be enabled.
    
    Fixes: 25fdd5933e4c ("drm/msm: Add SDM845 DPU support")
    Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
    Reviewed-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
    Patchwork: https://patchwork.freedesktop.org/patch/601049/
    Link: https://lore.kernel.org/r/20240627-dpu-virtual-wide-v5-2-5efb90cbb8be@linaro.org
    Signed-off-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/msm/dpu: take plane rotation into account for wide planes [+ + +]

Author: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Date:   Thu Jun 27 00:45:57 2024 +0300

    drm/msm/dpu: take plane rotation into account for wide planes
    
    [ Upstream commit d3a785e4f983f523380e023d8a05fb6d04402957 ]
    
    Take into account the plane rotation and flipping when calculating src
    positions for the wide plane parts.
    
    This is not an issue yet, because rotation is only supported for the
    UBWC planes and wide UBWC planes are rejected anyway because in parallel
    multirect case only the half of the usual width is supported for tiled
    formats. However it's better to fix this now rather than stumbling upon
    it later.
    
    Fixes: 80e8ae3b38ab ("drm/msm/dpu: add support for wide planes")
    Reviewed-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
    Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
    Patchwork: https://patchwork.freedesktop.org/patch/601059/
    Link: https://lore.kernel.org/r/20240627-dpu-virtual-wide-v5-3-5efb90cbb8be@linaro.org
    Signed-off-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/msm: fix the highest_bank_bit for sc7180 [+ + +]

Author: Abhinav Kumar <quic_abhinavk@quicinc.com>
Date:   Thu Aug 8 16:52:27 2024 -0700

    drm/msm: fix the highest_bank_bit for sc7180
    
    [ Upstream commit 3e30296b374af33cb4c12ff93df0b1e5b2d0f80b ]
    
    sc7180 programs the ubwc settings as 0x1e as that would mean a
    highest bank bit of 14 which matches what the GPU sets as well.
    
    However, the highest_bank_bit field of the msm_mdss_data which is
    being used to program the SSPP's fetch configuration is programmed
    to a highest bank bit of 16 as 0x3 translates to 16 and not 14.
    
    Fix the highest bank bit field used for the SSPP to match the mdss
    and gpu settings.
    
    Fixes: 6f410b246209 ("drm/msm/mdss: populate missing data")
    Reviewed-by: Rob Clark <robdclark@gmail.com>
    Tested-by: Stephen Boyd <swboyd@chromium.org> # Trogdor.Lazor
    Patchwork: https://patchwork.freedesktop.org/patch/607625/
    Link: https://lore.kernel.org/r/20240808235227.2701479-1-quic_abhinavk@quicinc.com
    Signed-off-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/v3d: Fix out-of-bounds read in `v3d_csd_job_run()` [+ + +]

Author: Maíra Canal <mcanal@igalia.com>
Date:   Fri Aug 9 12:18:45 2024 -0300

    drm/v3d: Fix out-of-bounds read in `v3d_csd_job_run()`
    
    [ Upstream commit 497d370a644d95a9f04271aa92cb96d32e84c770 ]
    
    When enabling UBSAN on Raspberry Pi 5, we get the following warning:
    
    [  387.894977] UBSAN: array-index-out-of-bounds in drivers/gpu/drm/v3d/v3d_sched.c:320:3
    [  387.903868] index 7 is out of range for type '__u32 [7]'
    [  387.909692] CPU: 0 PID: 1207 Comm: kworker/u16:2 Tainted: G        WC         6.10.3-v8-16k-numa #151
    [  387.919166] Hardware name: Raspberry Pi 5 Model B Rev 1.0 (DT)
    [  387.925961] Workqueue: v3d_csd drm_sched_run_job_work [gpu_sched]
    [  387.932525] Call trace:
    [  387.935296]  dump_backtrace+0x170/0x1b8
    [  387.939403]  show_stack+0x20/0x38
    [  387.942907]  dump_stack_lvl+0x90/0xd0
    [  387.946785]  dump_stack+0x18/0x28
    [  387.950301]  __ubsan_handle_out_of_bounds+0x98/0xd0
    [  387.955383]  v3d_csd_job_run+0x3a8/0x438 [v3d]
    [  387.960707]  drm_sched_run_job_work+0x520/0x6d0 [gpu_sched]
    [  387.966862]  process_one_work+0x62c/0xb48
    [  387.971296]  worker_thread+0x468/0x5b0
    [  387.975317]  kthread+0x1c4/0x1e0
    [  387.978818]  ret_from_fork+0x10/0x20
    [  387.983014] ---[ end trace ]---
    
    This happens because the UAPI provides only seven configuration
    registers and we are reading the eighth position of this u32 array.
    
    Therefore, fix the out-of-bounds read in `v3d_csd_job_run()` by
    accessing only seven positions on the '__u32 [7]' array. The eighth
    register exists indeed on V3D 7.1, but it isn't currently used. That
    being so, let's guarantee that it remains unused and add a note that it
    could be set in a future patch.
    
    Fixes: 0ad5bc1ce463 ("drm/v3d: fix up register addresses for V3D 7.x")
    Reported-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
    Signed-off-by: Maíra Canal <mcanal@igalia.com>
    Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20240809152001.668314-1-mcanal@igalia.com
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/xe/display: stop calling domains_driver_remove twice [+ + +]

Author: Matthew Auld <matthew.auld@intel.com>
Date:   Wed May 22 11:22:00 2024 +0100

    drm/xe/display: stop calling domains_driver_remove twice
    
    [ Upstream commit 48d74a0a45201de4efa016fb2f556889db37ed28 ]
    
    Unclear why we call this twice.
    
    Signed-off-by: Matthew Auld <matthew.auld@intel.com>
    Cc: Andrzej Hajda <andrzej.hajda@intel.com>
    Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
    Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-35-matthew.auld@intel.com
    Stable-dep-of: f4b2a0ae1a31 ("drm/xe: Fix opregion leak")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/xe/mmio: move mmio_fini over to devm [+ + +]

Author: Matthew Auld <matthew.auld@intel.com>
Date:   Wed May 22 11:21:57 2024 +0100

    drm/xe/mmio: move mmio_fini over to devm
    
    [ Upstream commit a0b834c8957a7d2848face008a12382a0ad11ffc ]
    
    Not valid to touch mmio once the device is removed, so make sure we
    unmap on removal and not just when driver instance goes away. Also set
    the mmio pointers to NULL to hopefully catch such issues more easily.
    
    Signed-off-by: Matthew Auld <matthew.auld@intel.com>
    Cc: Andrzej Hajda <andrzej.hajda@intel.com>
    Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
    Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-32-matthew.auld@intel.com
    Stable-dep-of: 15939ca77d44 ("drm/xe: Fix tile fini sequence")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/xe: Decouple job seqno and lrc seqno [+ + +]

Author: Matthew Brost <matthew.brost@intel.com>
Date:   Mon May 27 15:59:08 2024 +0200

    drm/xe: Decouple job seqno and lrc seqno
    
    [ Upstream commit 08f7200899ca72dec550af092ae424b7db099abd ]
    
    Tightly coupling these seqno presents problems if alternative fences for
    jobs are used. Decouple these for correctness.
    
    v2:
    - Slightly reword commit message (Thomas)
    - Make sure the lrc fence ops are used in comparison (Thomas)
    - Assume seqno is unsigned rather than signed in format string (Thomas)
    
    Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
    Signed-off-by: Matthew Brost <matthew.brost@intel.com>
    Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
    Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20240527135912.152156-2-thomas.hellstrom@linux.intel.com
    Stable-dep-of: 9e7f30563677 ("drm/xe: Free job before xe_exec_queue_put")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/xe: Do not dereference NULL job->fence in trace points [+ + +]

Author: Matthew Brost <matthew.brost@intel.com>
Date:   Tue Jun 4 22:50:41 2024 -0700

    drm/xe: Do not dereference NULL job->fence in trace points
    
    commit 5d30de4311d2d4165e78dc021c5cacb7496b3491 upstream.
    
    job->fence is not assigned until xe_sched_job_arm(), check for
    job->fence in xe_sched_job_seqno() so any usage of this function (trace
    points) do not result in NULL ptr dereference. Also check job->fence
    before assigning error in job trace points.
    
    Fixes: 0ac7a2c745e8 ("drm/xe: Don't initialize fences at xe_sched_job_create()")
    Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
    Signed-off-by: Matthew Brost <matthew.brost@intel.com>
    Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20240605055041.2082074-1-matthew.brost@intel.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/xe: Don't initialize fences at xe_sched_job_create() [+ + +]

Author: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Date:   Mon May 27 15:59:10 2024 +0200

    drm/xe: Don't initialize fences at xe_sched_job_create()
    
    [ Upstream commit 0ac7a2c745e8a42803378b944fa0f4455b7240f6 ]
    
    Pre-allocate but don't initialize fences at xe_sched_job_create(),
    and initialize / arm them instead at xe_sched_job_arm(). This
    makes it possible to move xe_sched_job_create() with its memory
    allocation out of any lock that is required for fence
    initialization, and that may not allow memory allocation under it.
    
    Replaces the struct dma_fence_array for parallell jobs with a
    struct dma_fence_chain, since the former doesn't allow
    a split-up between allocation and initialization.
    
    v2:
    - Rebase.
    - Don't always use the first lrc when initializing parallel
      lrc fences.
    - Use dma_fence_chain_contained() to access the lrc fences.
    
    v4:
    - Add an assert that job->lrc_seqno == fence->seqno.
      (Matthew Brost)
    
    Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
    Reviewed-by: Matthew Brost <matthew.brost@intel.com>
    Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> #v2
    Link: https://patchwork.freedesktop.org/patch/msgid/20240527135912.152156-4-thomas.hellstrom@linux.intel.com
    Stable-dep-of: 9e7f30563677 ("drm/xe: Free job before xe_exec_queue_put")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/xe: Fix missing workqueue destroy in xe_gt_pagefault [+ + +]

Author: Stuart Summers <stuart.summers@intel.com>
Date:   Sat Aug 17 02:47:30 2024 +0000

    drm/xe: Fix missing workqueue destroy in xe_gt_pagefault
    
    [ Upstream commit a6f78359ac75f24cac3c1bdd753c49c1877bcd82 ]
    
    On driver reload we never free up the memory for the pagefault and
    access counter workqueues. Add those destroy calls here.
    
    Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
    Signed-off-by: Stuart Summers <stuart.summers@intel.com>
    Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
    Signed-off-by: Matthew Brost <matthew.brost@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/c9a951505271dc3a7aee76de7656679f69c11518.1723862633.git.stuart.summers@intel.com
    (cherry picked from commit 7586fc52b14e0b8edd0d1f8a434e0de2078b7b2b)
    Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/xe: Fix opregion leak [+ + +]

Author: Lucas De Marchi <lucas.demarchi@intel.com>
Date:   Wed Jul 24 14:53:09 2024 -0700

    drm/xe: Fix opregion leak
    
    [ Upstream commit f4b2a0ae1a31fd3d1b5ca18ee08319b479cf9b5f ]
    
    Being part o the display, ideally the setup and cleanup would be done by
    display itself. However this is a bigger refactor that needs to be done
    on both i915 and xe. For now, just fix the leak:
    
    unreferenced object 0xffff8881a0300008 (size 192):
      comm "modprobe", pid 4354, jiffies 4295647021
      hex dump (first 32 bytes):
        00 00 87 27 81 88 ff ff 18 80 9b 00 00 c9 ff ff  ...'............
        18 81 9b 00 00 c9 ff ff 00 00 00 00 00 00 00 00  ................
      backtrace (crc 99260e31):
        [<ffffffff823ce65b>] kmemleak_alloc+0x4b/0x80
        [<ffffffff81493be2>] kmalloc_trace_noprof+0x312/0x3d0
        [<ffffffffa1345679>] intel_opregion_setup+0x89/0x700 [xe]
        [<ffffffffa125bfaf>] xe_display_init_noirq+0x2f/0x90 [xe]
        [<ffffffffa1199ec3>] xe_device_probe+0x7a3/0xbf0 [xe]
        [<ffffffffa11f3713>] xe_pci_probe+0x333/0x5b0 [xe]
        [<ffffffff81af6be8>] local_pci_probe+0x48/0xb0
        [<ffffffff81af8778>] pci_device_probe+0xc8/0x280
        [<ffffffff81d09048>] really_probe+0xf8/0x390
        [<ffffffff81d0937a>] __driver_probe_device+0x8a/0x170
        [<ffffffff81d09503>] driver_probe_device+0x23/0xb0
        [<ffffffff81d097b7>] __driver_attach+0xc7/0x190
        [<ffffffff81d0628d>] bus_for_each_dev+0x7d/0xd0
        [<ffffffff81d0851e>] driver_attach+0x1e/0x30
        [<ffffffff81d07ac7>] bus_add_driver+0x117/0x250
    
    Fixes: 44e694958b95 ("drm/xe/display: Implement display support")
    Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20240724215309.644423-1-lucas.demarchi@intel.com
    Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
    (cherry picked from commit 6f4e43a2f771b737d991142ec4f6d4b7ff31fbb4)
    Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/xe: Fix tile fini sequence [+ + +]

Author: Matthew Brost <matthew.brost@intel.com>
Date:   Fri Aug 9 16:28:30 2024 -0700

    drm/xe: Fix tile fini sequence
    
    [ Upstream commit 15939ca77d4424f736e1e4953b4da2351cc9689d ]
    
    Only set tile->mmio.regs to NULL if not the root tile in tile_fini. The
    root tile mmio regs is setup ealier in MMIO init thus it should be set
    to NULL in mmio_fini.
    
    Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
    Signed-off-by: Matthew Brost <matthew.brost@intel.com>
    Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
    Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20240809232830.3302251-1-matthew.brost@intel.com
    (cherry picked from commit 3396900aa273903639a1792afa4d23dc09bec291)
    Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/xe: Free job before xe_exec_queue_put [+ + +]

Author: Matthew Brost <matthew.brost@intel.com>
Date:   Tue Aug 20 13:23:09 2024 -0700

    drm/xe: Free job before xe_exec_queue_put
    
    [ Upstream commit 9e7f30563677fbeff62d368d5d2a5ac7aaa9746a ]
    
    Free job depends on job->vm being valid, the last xe_exec_queue_put can
    destroy the VM. Prevent UAF by freeing job before xe_exec_queue_put.
    
    Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
    Signed-off-by: Matthew Brost <matthew.brost@intel.com>
    Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
    Reviewed-by: Jagmeet Randhawa <jagmeet.randhawa@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20240820202309.1260755-1-matthew.brost@intel.com
    (cherry picked from commit 32a42c93b74c8ca6d0915ea3eba21bceff53042f)
    Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/xe: Relax runtime pm protection during execution [+ + +]

Author: Rodrigo Vivi <rodrigo.vivi@intel.com>
Date:   Wed May 22 13:01:01 2024 -0400

    drm/xe: Relax runtime pm protection during execution
    
    [ Upstream commit ad1e331fc451a2cffc72ae193b843682ce237e24 ]
    
    Limit the protection only during moments of actual job execution,
    and introduce protection for guc submit fini, which is currently
    unprotected due to the absence of exec_queue life protection.
    
    In the regular use case scenario, user space will create an
    exec queue, and keep it alive to reuse that until it is done
    with that kind of workload.
    
    For the regular desktop cases, it means that the exec_queue
    is alive even on idle scenarios where display goes off. This
    is unacceptable since this would entirely block runtime PM
    indefinitely, blocking deeper Package-C state. This would be
    a waste drainage of power.
    
    Cc: Matthew Brost <matthew.brost@intel.com>
    Tested-by: Francois Dugast <francois.dugast@intel.com>
    Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20240522170105.327472-3-rodrigo.vivi@intel.com
    Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
    Stable-dep-of: 9e7f30563677 ("drm/xe: Free job before xe_exec_queue_put")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/xe: reset mmio mappings with devm [+ + +]

Author: Matthew Auld <matthew.auld@intel.com>
Date:   Wed May 22 11:21:58 2024 +0100

    drm/xe: reset mmio mappings with devm
    
    [ Upstream commit c7117419784f612d59ee565145f722e8b5541fe6 ]
    
    Set our various mmio mappings to NULL. This should make it easier to
    catch something rogue trying to mess with mmio after device removal. For
    example, we might unmap everything and then start hitting some mmio
    address which has already been unmamped by us and then remapped by
    something else, causing all kinds of carnage.
    
    Signed-off-by: Matthew Auld <matthew.auld@intel.com>
    Cc: Andrzej Hajda <andrzej.hajda@intel.com>
    Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
    Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-33-matthew.auld@intel.com
    Stable-dep-of: 15939ca77d44 ("drm/xe: Fix tile fini sequence")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/xe: Split lrc seqno fence creation up [+ + +]

Author: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Date:   Mon May 27 15:59:09 2024 +0200

    drm/xe: Split lrc seqno fence creation up
    
    [ Upstream commit e183910ae4015214475b3248ce0b4c70f104f254 ]
    
    Since sometimes a lock is required to initialize a seqno fence,
    and it might be desirable not to hold that lock while performing
    memory allocations, split the lrc seqno fence creation up into an
    allocation phase and an initialization phase.
    
    Since lrc seqno fences under the hood are hw_fences, do the same
    for these and remove the xe_hw_fence_create() function since it
    is not used anymore.
    
    Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
    Reviewed-by: Matthew Brost <matthew.brost@intel.com>
    Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20240527135912.152156-3-thomas.hellstrom@linux.intel.com
    Stable-dep-of: 9e7f30563677 ("drm/xe: Free job before xe_exec_queue_put")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

filelock: fix name of file_lease slab cache [+ + +]

Author: Omar Sandoval <osandov@fb.com>
Date:   Mon Jul 29 15:48:12 2024 -0700

    filelock: fix name of file_lease slab cache
    
    [ Upstream commit 3f65f3c099bcb27949e712f39ba836f21785924a ]
    
    When struct file_lease was split out from struct file_lock, the name of
    the file_lock slab cache was copied to the new slab cache for
    file_lease. This name conflict causes confusion in /proc/slabinfo and
    /sys/kernel/slab. In particular, it caused failures in drgn's test case
    for slab cache merging.
    
    Link: https://github.com/osandov/drgn/blob/9ad29fd86499eb32847473e928b6540872d3d59a/tests/linux_kernel/helpers/test_slab.py#L81
    Fixes: c69ff4071935 ("filelock: split leases out of struct file_lock")
    Signed-off-by: Omar Sandoval <osandov@fb.com>
    Link: https://lore.kernel.org/r/2d1d053da1cafb3e7940c4f25952da4f0af34e38.1722293276.git.osandov@fb.com
    Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
    Reviewed-by: Jeff Layton <jlayton@kernel.org>
    Signed-off-by: Christian Brauner <brauner@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

Linux: fix bitmap corruption on close_range() with CLOSE_RANGE_UNSHARE [+ + +]

Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Sat Aug 3 18:02:00 2024 -0400

    fix bitmap corruption on close_range() with CLOSE_RANGE_UNSHARE
    
    commit 9a2fa1472083580b6c66bdaf291f591e1170123a upstream.
    
    copy_fd_bitmaps(new, old, count) is expected to copy the first
    count/BITS_PER_LONG bits from old->full_fds_bits[] and fill
    the rest with zeroes.  What it does is copying enough words
    (BITS_TO_LONGS(count/BITS_PER_LONG)), then memsets the rest.
    That works fine, *if* all bits past the cutoff point are
    clear.  Otherwise we are risking garbage from the last word
    we'd copied.
    
    For most of the callers that is true - expand_fdtable() has
    count equal to old->max_fds, so there's no open descriptors
    past count, let alone fully occupied words in ->open_fds[],
    which is what bits in ->full_fds_bits[] correspond to.
    
    The other caller (dup_fd()) passes sane_fdtable_size(old_fdt, max_fds),
    which is the smallest multiple of BITS_PER_LONG that covers all
    opened descriptors below max_fds.  In the common case (copying on
    fork()) max_fds is ~0U, so all opened descriptors will be below
    it and we are fine, by the same reasons why the call in expand_fdtable()
    is safe.
    
    Unfortunately, there is a case where max_fds is less than that
    and where we might, indeed, end up with junk in ->full_fds_bits[] -
    close_range(from, to, CLOSE_RANGE_UNSHARE) with
            * descriptor table being currently shared
            * 'to' being above the current capacity of descriptor table
            * 'from' being just under some chunk of opened descriptors.
    In that case we end up with observably wrong behaviour - e.g. spawn
    a child with CLONE_FILES, get all descriptors in range 0..127 open,
    then close_range(64, ~0U, CLOSE_RANGE_UNSHARE) and watch dup(0) ending
    up with descriptor #128, despite #64 being observably not open.
    
    The minimally invasive fix would be to deal with that in dup_fd().
    If this proves to add measurable overhead, we can go that way, but
    let's try to fix copy_fd_bitmaps() first.
    
    * new helper: bitmap_copy_and_expand(to, from, bits_to_copy, size).
    * make copy_fd_bitmaps() take the bitmap size in words, rather than
    bits; it's 'count' argument is always a multiple of BITS_PER_LONG,
    so we are not losing any information, and that way we can use the
    same helper for all three bitmaps - compiler will see that count
    is a multiple of BITS_PER_LONG for the large ones, so it'll generate
    plain memcpy()+memset().
    
    Reproducer added to tools/testing/selftests/core/close_range_test.c
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fs/netfs/fscache_cookie: add missing "n_accesses" check [+ + +]

Author: Max Kellermann <max.kellermann@ionos.com>
Date:   Mon Jul 29 17:19:30 2024 +0100

    fs/netfs/fscache_cookie: add missing "n_accesses" check
    
    commit f71aa06398aabc2e3eaac25acdf3d62e0094ba70 upstream.
    
    This fixes a NULL pointer dereference bug due to a data race which
    looks like this:
    
      BUG: kernel NULL pointer dereference, address: 0000000000000008
      #PF: supervisor read access in kernel mode
      #PF: error_code(0x0000) - not-present page
      PGD 0 P4D 0
      Oops: 0000 [#1] SMP PTI
      CPU: 33 PID: 16573 Comm: kworker/u97:799 Not tainted 6.8.7-cm4all1-hp+ #43
      Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/17/2018
      Workqueue: events_unbound netfs_rreq_write_to_cache_work
      RIP: 0010:cachefiles_prepare_write+0x30/0xa0
      Code: 57 41 56 45 89 ce 41 55 49 89 cd 41 54 49 89 d4 55 53 48 89 fb 48 83 ec 08 48 8b 47 08 48 83 7f 10 00 48 89 34 24 48 8b 68 20 <48> 8b 45 08 4c 8b 38 74 45 49 8b 7f 50 e8 4e a9 b0 ff 48 8b 73 10
      RSP: 0018:ffffb4e78113bde0 EFLAGS: 00010286
      RAX: ffff976126be6d10 RBX: ffff97615cdb8438 RCX: 0000000000020000
      RDX: ffff97605e6c4c68 RSI: ffff97605e6c4c60 RDI: ffff97615cdb8438
      RBP: 0000000000000000 R08: 0000000000278333 R09: 0000000000000001
      R10: ffff97605e6c4600 R11: 0000000000000001 R12: ffff97605e6c4c68
      R13: 0000000000020000 R14: 0000000000000001 R15: ffff976064fe2c00
      FS:  0000000000000000(0000) GS:ffff9776dfd40000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000008 CR3: 000000005942c002 CR4: 00000000001706f0
      Call Trace:
       <TASK>
       ? __die+0x1f/0x70
       ? page_fault_oops+0x15d/0x440
       ? search_module_extables+0xe/0x40
       ? fixup_exception+0x22/0x2f0
       ? exc_page_fault+0x5f/0x100
       ? asm_exc_page_fault+0x22/0x30
       ? cachefiles_prepare_write+0x30/0xa0
       netfs_rreq_write_to_cache_work+0x135/0x2e0
       process_one_work+0x137/0x2c0
       worker_thread+0x2e9/0x400
       ? __pfx_worker_thread+0x10/0x10
       kthread+0xcc/0x100
       ? __pfx_kthread+0x10/0x10
       ret_from_fork+0x30/0x50
       ? __pfx_kthread+0x10/0x10
       ret_from_fork_asm+0x1b/0x30
       </TASK>
      Modules linked in:
      CR2: 0000000000000008
      ---[ end trace 0000000000000000 ]---
    
    This happened because fscache_cookie_state_machine() was slow and was
    still running while another process invoked fscache_unuse_cookie();
    this led to a fscache_cookie_lru_do_one() call, setting the
    FSCACHE_COOKIE_DO_LRU_DISCARD flag, which was picked up by
    fscache_cookie_state_machine(), withdrawing the cookie via
    cachefiles_withdraw_cookie(), clearing cookie->cache_priv.
    
    At the same time, yet another process invoked
    cachefiles_prepare_write(), which found a NULL pointer in this code
    line:
    
      struct cachefiles_object *object = cachefiles_cres_object(cres);
    
    The next line crashes, obviously:
    
      struct cachefiles_cache *cache = object->volume->cache;
    
    During cachefiles_prepare_write(), the "n_accesses" counter is
    non-zero (via fscache_begin_operation()).  The cookie must not be
    withdrawn until it drops to zero.
    
    The counter is checked by fscache_cookie_state_machine() before
    switching to FSCACHE_COOKIE_STATE_RELINQUISHING and
    FSCACHE_COOKIE_STATE_WITHDRAWING (in "case
    FSCACHE_COOKIE_STATE_FAILED"), but not for
    FSCACHE_COOKIE_STATE_LRU_DISCARDING ("case
    FSCACHE_COOKIE_STATE_ACTIVE").
    
    This patch adds the missing check.  With a non-zero access counter,
    the function returns and the next fscache_end_cookie_access() call
    will queue another fscache_cookie_state_machine() call to handle the
    still-pending FSCACHE_COOKIE_DO_LRU_DISCARD.
    
    Fixes: 12bb21a29c19 ("fscache: Implement cookie user counting and resource pinning")
    Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
    Signed-off-by: David Howells <dhowells@redhat.com>
    Link: https://lore.kernel.org/r/20240729162002.3436763-2-dhowells@redhat.com
    cc: Jeff Layton <jlayton@kernel.org>
    cc: netfs@lists.linux.dev
    cc: linux-fsdevel@vger.kernel.org
    cc: stable@vger.kernel.org
    Signed-off-by: Christian Brauner <brauner@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fuse: Initialize beyond-EOF page contents before setting uptodate [+ + +]

Author: Jann Horn <jannh@google.com>
Date:   Tue Aug 6 21:51:42 2024 +0200

    fuse: Initialize beyond-EOF page contents before setting uptodate
    
    commit 3c0da3d163eb32f1f91891efaade027fa9b245b9 upstream.
    
    fuse_notify_store(), unlike fuse_do_readpage(), does not enable page
    zeroing (because it can be used to change partial page contents).
    
    So fuse_notify_store() must be more careful to fully initialize page
    contents (including parts of the page that are beyond end-of-file)
    before marking the page uptodate.
    
    The current code can leave beyond-EOF page contents uninitialized, which
    makes these uninitialized page contents visible to userspace via mmap().
    
    This is an information leak, but only affects systems which do not
    enable init-on-alloc (via CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y or the
    corresponding kernel command line parameter).
    
    Link: https://bugs.chromium.org/p/project-zero/issues/detail?id=2574
    Cc: stable@kernel.org
    Fixes: a1d75f258230 ("fuse: add store request")
    Signed-off-by: Jann Horn <jannh@google.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

gpio: mlxbf3: Support shutdown() function [+ + +]

Author: Asmaa Mnebhi <asmaa@nvidia.com>
Date:   Tue Jun 11 13:15:09 2024 -0400

    gpio: mlxbf3: Support shutdown() function
    
    [ Upstream commit aad41832326723627ad8ac9ee8a543b6dca4454d ]
    
    During Linux graceful reboot, the GPIO interrupts are not disabled.
    Since the drivers are not removed during graceful reboot,
    the logic to call mlxbf3_gpio_irq_disable() is not triggered.
    Interrupts that remain enabled can cause issues on subsequent boots.
    
    For example, the mlxbf-gige driver contains PHY logic to bring up the link.
    If the gpio-mlxbf3 driver loads first, the mlxbf-gige driver
    will use a GPIO interrupt to bring up the link.
    Otherwise, it will use polling.
    The next time Linux boots and loads the drivers in this order, we encounter the issue:
    - mlxbf-gige loads first and uses polling while the GPIO10
      interrupt is still enabled from the previous boot. So if
      the interrupt triggers, there is nothing to clear it.
    - gpio-mlxbf3 loads.
    - i2c-mlxbf loads. The interrupt doesn't trigger for I2C
      because it is shared with the GPIO interrupt line which
      was not cleared.
    
    The solution is to add a shutdown function to the GPIO driver to clear and disable
    all interrupts. Also clear the interrupt after disabling it in mlxbf3_gpio_irq_disable().
    
    Fixes: 38a700efc510 ("gpio: mlxbf3: Add gpio driver support")
    Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com>
    Reviewed-by: David Thompson <davthompson@nvidia.com>
    Reviewed-by: Andy Shevchenko <andy@kernel.org>
    Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
    Link: https://lore.kernel.org/r/20240611171509.22151-1-asmaa@nvidia.com
    Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

gtp: pull network headers in gtp_dev_xmit() [+ + +]

Author: Eric Dumazet <edumazet@google.com>
Date:   Thu Aug 8 13:24:55 2024 +0000

    gtp: pull network headers in gtp_dev_xmit()
    
    [ Upstream commit 3a3be7ff9224f424e485287b54be00d2c6bd9c40 ]
    
    syzbot/KMSAN reported use of uninit-value in get_dev_xmit() [1]
    
    We must make sure the IPv4 or Ipv6 header is pulled in skb->head
    before accessing fields in them.
    
    Use pskb_inet_may_pull() to fix this issue.
    
    [1]
    BUG: KMSAN: uninit-value in ipv6_pdp_find drivers/net/gtp.c:220 [inline]
     BUG: KMSAN: uninit-value in gtp_build_skb_ip6 drivers/net/gtp.c:1229 [inline]
     BUG: KMSAN: uninit-value in gtp_dev_xmit+0x1424/0x2540 drivers/net/gtp.c:1281
      ipv6_pdp_find drivers/net/gtp.c:220 [inline]
      gtp_build_skb_ip6 drivers/net/gtp.c:1229 [inline]
      gtp_dev_xmit+0x1424/0x2540 drivers/net/gtp.c:1281
      __netdev_start_xmit include/linux/netdevice.h:4913 [inline]
      netdev_start_xmit include/linux/netdevice.h:4922 [inline]
      xmit_one net/core/dev.c:3580 [inline]
      dev_hard_start_xmit+0x247/0xa20 net/core/dev.c:3596
      __dev_queue_xmit+0x358c/0x5610 net/core/dev.c:4423
      dev_queue_xmit include/linux/netdevice.h:3105 [inline]
      packet_xmit+0x9c/0x6c0 net/packet/af_packet.c:276
      packet_snd net/packet/af_packet.c:3145 [inline]
      packet_sendmsg+0x90e3/0xa3a0 net/packet/af_packet.c:3177
      sock_sendmsg_nosec net/socket.c:730 [inline]
      __sock_sendmsg+0x30f/0x380 net/socket.c:745
      __sys_sendto+0x685/0x830 net/socket.c:2204
      __do_sys_sendto net/socket.c:2216 [inline]
      __se_sys_sendto net/socket.c:2212 [inline]
      __x64_sys_sendto+0x125/0x1d0 net/socket.c:2212
      x64_sys_call+0x3799/0x3c10 arch/x86/include/generated/asm/syscalls_64.h:45
      do_syscall_x64 arch/x86/entry/common.c:52 [inline]
      do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
     entry_SYSCALL_64_after_hwframe+0x77/0x7f
    
    Uninit was created at:
      slab_post_alloc_hook mm/slub.c:3994 [inline]
      slab_alloc_node mm/slub.c:4037 [inline]
      kmem_cache_alloc_node_noprof+0x6bf/0xb80 mm/slub.c:4080
      kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:583
      __alloc_skb+0x363/0x7b0 net/core/skbuff.c:674
      alloc_skb include/linux/skbuff.h:1320 [inline]
      alloc_skb_with_frags+0xc8/0xbf0 net/core/skbuff.c:6526
      sock_alloc_send_pskb+0xa81/0xbf0 net/core/sock.c:2815
      packet_alloc_skb net/packet/af_packet.c:2994 [inline]
      packet_snd net/packet/af_packet.c:3088 [inline]
      packet_sendmsg+0x749c/0xa3a0 net/packet/af_packet.c:3177
      sock_sendmsg_nosec net/socket.c:730 [inline]
      __sock_sendmsg+0x30f/0x380 net/socket.c:745
      __sys_sendto+0x685/0x830 net/socket.c:2204
      __do_sys_sendto net/socket.c:2216 [inline]
      __se_sys_sendto net/socket.c:2212 [inline]
      __x64_sys_sendto+0x125/0x1d0 net/socket.c:2212
      x64_sys_call+0x3799/0x3c10 arch/x86/include/generated/asm/syscalls_64.h:45
      do_syscall_x64 arch/x86/entry/common.c:52 [inline]
      do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
     entry_SYSCALL_64_after_hwframe+0x77/0x7f
    
    CPU: 0 UID: 0 PID: 7115 Comm: syz.1.515 Not tainted 6.11.0-rc1-syzkaller-00043-g94ede2a3e913 #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/27/2024
    
    Fixes: 999cb275c807 ("gtp: add IPv6 support")
    Fixes: 459aa660eb1d ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)")
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Cc: Harald Welte <laforge@gnumonks.org>
    Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
    Link: https://patch.msgid.link/20240808132455.3413916-1-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

HID: wacom: Defer calculation of resolution until resolution_code is known [+ + +]

Author: Jason Gerecke <jason.gerecke@wacom.com>
Date:   Tue Jul 30 08:51:55 2024 -0700

    HID: wacom: Defer calculation of resolution until resolution_code is known
    
    commit 1b8f9c1fb464968a5b18d3acc1da8c00bad24fad upstream.
    
    The Wacom driver maps the HID_DG_TWIST usage to ABS_Z (rather than ABS_RZ)
    for historic reasons. When the code to support twist was introduced in
    commit 50066a042da5 ("HID: wacom: generic: Add support for height, tilt,
    and twist usages"), we were careful to write it in such a way that it had
    HID calculate the resolution of the twist axis assuming ABS_RZ instead
    (so that we would get correct angular behavior). This was broken with
    the introduction of commit 08a46b4190d3 ("HID: wacom: Set a default
    resolution for older tablets"), which moved the resolution calculation
    to occur *before* the adjustment from ABS_Z to ABS_RZ occurred.
    
    This commit moves the calculation of resolution after the point that
    we are finished setting things up for its proper use.
    
    Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com>
    Fixes: 08a46b4190d3 ("HID: wacom: Set a default resolution for older tablets")
    Cc: stable@vger.kernel.org
    Signed-off-by: Jiri Kosina <jkosina@suse.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

i2c: qcom-geni: Add missing geni_icc_disable in geni_i2c_runtime_resume [+ + +]

Author: Andi Shyti <andi.shyti@kernel.org>
Date:   Mon Aug 12 21:40:28 2024 +0200

    i2c: qcom-geni: Add missing geni_icc_disable in geni_i2c_runtime_resume
    
    commit 4e91fa1ef3ce6290b4c598e54b5eb6cf134fbec8 upstream.
    
    Add the missing geni_icc_disable() call before returning in the
    geni_i2c_runtime_resume() function.
    
    Commit 9ba48db9f77c ("i2c: qcom-geni: Add missing
    geni_icc_disable in geni_i2c_runtime_resume") by Gaosheng missed
    disabling the interconnect in one case.
    
    Fixes: bf225ed357c6 ("i2c: i2c-qcom-geni: Add interconnect support")
    Cc: Gaosheng Cui <cuigaosheng1@huawei.com>
    Cc: stable@vger.kernel.org # v5.9+
    Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

i2c: tegra: Do not mark ACPI devices as irq safe [+ + +]

Author: Breno Leitao <leitao@debian.org>
Date:   Tue Aug 13 09:12:53 2024 -0700

    i2c: tegra: Do not mark ACPI devices as irq safe
    
    commit 14d069d92951a3e150c0a81f2ca3b93e54da913b upstream.
    
    On ACPI machines, the tegra i2c module encounters an issue due to a
    mutex being called inside a spinlock. This leads to the following bug:
    
            BUG: sleeping function called from invalid context at kernel/locking/mutex.c:585
            ...
    
            Call trace:
            __might_sleep
            __mutex_lock_common
            mutex_lock_nested
            acpi_subsys_runtime_resume
            rpm_resume
            tegra_i2c_xfer
    
    The problem arises because during __pm_runtime_resume(), the spinlock
    &dev->power.lock is acquired before rpm_resume() is called. Later,
    rpm_resume() invokes acpi_subsys_runtime_resume(), which relies on
    mutexes, triggering the error.
    
    To address this issue, devices on ACPI are now marked as not IRQ-safe,
    considering the dependency of acpi_subsys_runtime_resume() on mutexes.
    
    Fixes: bd2fdedbf2ba ("i2c: tegra: Add the ACPI support")
    Cc: <stable@vger.kernel.org> # v5.17+
    Co-developed-by: Michael van der Westhuizen <rmikey@meta.com>
    Signed-off-by: Michael van der Westhuizen <rmikey@meta.com>
    Signed-off-by: Breno Leitao <leitao@debian.org>
    Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
    Reviewed-by: Andy Shevchenko <andy@kernel.org>
    Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ice: fix ICE_LAST_OFFSET formula [+ + +]

Author: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Date:   Wed Aug 7 12:53:25 2024 +0200

    ice: fix ICE_LAST_OFFSET formula
    
    [ Upstream commit b966ad832942b5a11e002f9b5ef102b08425b84a ]
    
    For bigger PAGE_SIZE archs, ice driver works on 3k Rx buffers.
    Therefore, ICE_LAST_OFFSET should take into account ICE_RXBUF_3072, not
    ICE_RXBUF_2048.
    
    Fixes: 7237f5b0dba4 ("ice: introduce legacy Rx flag")
    Suggested-by: Luiz Capitulino <luizcap@redhat.com>
    Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
    Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
    Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ice: fix page reuse when PAGE_SIZE is over 8k [+ + +]

Author: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Date:   Wed Aug 7 12:53:24 2024 +0200

    ice: fix page reuse when PAGE_SIZE is over 8k
    
    [ Upstream commit 50b2143356e888777fc5bca023c39f34f404613a ]
    
    Architectures that have PAGE_SIZE >= 8192 such as arm64 should act the
    same as x86 currently, meaning reuse of a page should only take place
    when no one else is busy with it.
    
    Do two things independently of underlying PAGE_SIZE:
    - store the page count under ice_rx_buf::pgcnt
    - then act upon its value vs ice_rx_buf::pagecnt_bias when making the
      decision regarding page reuse
    
    Fixes: 2b245cb29421 ("ice: Implement transmit and NAPI support")
    Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
    Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
    Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ice: fix truesize operations for PAGE_SIZE >= 8192 [+ + +]

Author: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Date:   Wed Aug 7 12:53:26 2024 +0200

    ice: fix truesize operations for PAGE_SIZE >= 8192
    
    [ Upstream commit d53d4dcce69be5773e2d0878c9899ebfbf58c393 ]
    
    When working on multi-buffer packet on arch that has PAGE_SIZE >= 8192,
    truesize is calculated and stored in xdp_buff::frame_sz per each
    processed Rx buffer. This means that frame_sz will contain the truesize
    based on last received buffer, but commit 1dc1a7e7f410 ("ice:
    Centrallize Rx buffer recycling") assumed this value will be constant
    for each buffer, which breaks the page recycling scheme and mess up the
    way we update the page::page_offset.
    
    To fix this, let us work on constant truesize when PAGE_SIZE >= 8192
    instead of basing this on size of a packet read from Rx descriptor. This
    way we can simplify the code and avoid calculating truesize per each
    received frame and on top of that when using
    xdp_update_skb_shared_info(), current formula for truesize update will
    be valid.
    
    This means ice_rx_frame_truesize() can be removed altogether.
    Furthermore, first call to it within ice_clean_rx_irq() for 4k PAGE_SIZE
    was redundant as xdp_buff::frame_sz is initialized via xdp_init_buff()
    in ice_vsi_cfg_rxq(). This should have been removed at the point where
    xdp_buff struct started to be a member of ice_rx_ring and it was no
    longer a stack based variable.
    
    There are two fixes tags as my understanding is that the first one
    exposed us to broken truesize and page_offset handling and then second
    introduced broken skb_shared_info update in ice_{construct,build}_skb().
    
    Reported-and-tested-by: Luiz Capitulino <luizcap@redhat.com>
    Closes: https://lore.kernel.org/netdev/8f9e2a5c-fd30-4206-9311-946a06d031bb@redhat.com/
    Fixes: 1dc1a7e7f410 ("ice: Centrallize Rx buffer recycling")
    Fixes: 2fba7dc5157b ("ice: Add support for XDP multi-buffer on Rx side")
    Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
    Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
    Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ice: use internal pf id instead of function number [+ + +]

Author: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Date:   Mon Aug 19 09:17:42 2024 +0200

    ice: use internal pf id instead of function number
    
    [ Upstream commit 503ab6ee40fc103ea55cc9e50bb879e571d65aac ]
    
    Use always the same pf id in devlink port number. When doing
    pass-through the PF to VM bus info func number can be any value.
    
    Fixes: 2ae0aa4758b0 ("ice: Move devlink port to PF/VF struct")
    Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
    Suggested-by: Jiri Pirko <jiri@resnulli.us>
    Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
    Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

igb: cope with large MAX_SKB_FRAGS [+ + +]

Author: Paolo Abeni <pabeni@redhat.com>
Date:   Fri Aug 16 17:20:34 2024 +0200

    igb: cope with large MAX_SKB_FRAGS
    
    [ Upstream commit 8aba27c4a5020abdf60149239198297f88338a8d ]
    
    Sabrina reports that the igb driver does not cope well with large
    MAX_SKB_FRAG values: setting MAX_SKB_FRAG to 45 causes payload
    corruption on TX.
    
    An easy reproducer is to run ssh to connect to the machine.  With
    MAX_SKB_FRAGS=17 it works, with MAX_SKB_FRAGS=45 it fails.  This has
    been reported originally in
    https://bugzilla.redhat.com/show_bug.cgi?id=2265320
    
    The root cause of the issue is that the driver does not take into
    account properly the (possibly large) shared info size when selecting
    the ring layout, and will try to fit two packets inside the same 4K
    page even when the 1st fraglist will trump over the 2nd head.
    
    Address the issue by checking if 2K buffers are insufficient.
    
    Fixes: 3948b05950fd ("net: introduce a config option to tweak MAX_SKB_FRAGS")
    Reported-by: Jan Tluka <jtluka@redhat.com>
    Reported-by: Jirka Hladky <jhladky@redhat.com>
    Reported-by: Sabrina Dubroca <sd@queasysnail.net>
    Tested-by: Sabrina Dubroca <sd@queasysnail.net>
    Tested-by: Corinna Vinschen <vinschen@redhat.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Corinna Vinschen <vinschen@redhat.com>
    Link: https://patch.msgid.link/20240816152034.1453285-1-vinschen@redhat.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

igc: Fix packet still tx after gate close by reducing i226 MAC retry buffer [+ + +]

Author: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
Date:   Sat Jul 6 11:38:07 2024 -0400

    igc: Fix packet still tx after gate close by reducing i226 MAC retry buffer
    
    [ Upstream commit e037a26ead187901f83cad9c503ccece5ff6817a ]
    
    Testing uncovered that even when the taprio gate is closed, some packets
    still transmit.
    
    According to i225/6 hardware errata [1], traffic might overflow the
    planned QBV window. This happens because MAC maintains an internal buffer,
    primarily for supporting half duplex retries. Therefore, even when the
    gate closes, residual MAC data in the buffer may still transmit.
    
    To mitigate this for i226, reduce the MAC's internal buffer from 192 bytes
    to the recommended 88 bytes by modifying the RETX_CTL register value.
    
    This follows guidelines from:
    [1] Ethernet Controller I225/I22 Spec Update Rev 2.1 Errata Item 9:
        TSN: Packet Transmission Might Cross Qbv Window
    [2] I225/6 SW User Manual Rev 1.2.4: Section 8.11.5 Retry Buffer Control
    
    Note that the RETX_CTL register can't be used in TSN mode because half
    duplex feature cannot coexist with TSN.
    
    Test Steps:
    1.  Send taprio cmd to board A:
        tc qdisc replace dev enp1s0 parent root handle 100 taprio \
        num_tc 4 \
        map 3 2 1 0 3 3 3 3 3 3 3 3 3 3 3 3 \
        queues 1@0 1@1 1@2 1@3 \
        base-time 0 \
        sched-entry S 0x07 500000 \
        sched-entry S 0x0f 500000 \
        flags 0x2 \
        txtime-delay 0
    
        Note that for TC3, gate should open for 500us and close for another
        500us.
    
    3.  Take tcpdump log on Board B.
    
    4.  Send udp packets via UDP tai app from Board A to Board B.
    
    5.  Analyze tcpdump log via wireshark log on Board B. Ensure that the
        total time from the first to the last packet received during one cycle
        for TC3 does not exceed 500us.
    
    Fixes: 43546211738e ("igc: Add new device ID's")
    Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
    Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
    Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
    Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

igc: Fix qbv tx latency by setting gtxoffset [+ + +]

Author: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
Date:   Sun Jul 7 08:53:18 2024 -0400

    igc: Fix qbv tx latency by setting gtxoffset
    
    [ Upstream commit 6c3fc0b1c3d073bd6fc3bf43dbd0e64240537464 ]
    
    A large tx latency issue was discovered during testing when only QBV was
    enabled. The issue occurs because gtxoffset was not set when QBV is
    active, it was only set when launch time is active.
    
    The patch "igc: Correct the launchtime offset" only sets gtxoffset when
    the launchtime_enable field is set by the user. Enabling launchtime_enable
    ultimately sets the register IGC_TXQCTL_QUEUE_MODE_LAUNCHT (referred to as
    LaunchT in the SW user manual).
    
    Section 7.5.2.6 of the IGC i225/6 SW User Manual Rev 1.2.4 states:
    "The latency between transmission scheduling (launch time) and the
    time the packet is transmitted to the network is listed in Table 7-61."
    
    However, the patch misinterprets the phrase "launch time" in that section
    by assuming it specifically refers to the LaunchT register, whereas it
    actually denotes the generic term for when a packet is released from the
    internal buffer to the MAC transmit logic.
    
    This launch time, as per that section, also implicitly refers to the QBV
    gate open time, where a packet waits in the buffer for the QBV gate to
    open. Therefore, latency applies whenever QBV is in use. TSN features such
    as QBU and QAV reuse QBV, making the latency universal to TSN features.
    
    Discussed with i226 HW owner (Shalev, Avi) and we were in agreement that
    the term "launch time" used in Section 7.5.2.6 is not clear and can be
    easily misinterpreted. Avi will update this section to:
    "When TQAVCTRL.TRANSMIT_MODE = TSN, the latency between transmission
    scheduling and the time the packet is transmitted to the network is listed
    in Table 7-61."
    
    Fix this issue by using igc_tsn_is_tx_mode_in_tsn() as a condition to
    write to gtxoffset, aligning with the newly updated SW User Manual.
    
    Tested:
    1. Enrol taprio on talker board
       base-time 0
       cycle-time 1000000
       flags 0x2
       index 0 cmd S gatemask 0x1 interval1
       index 0 cmd S gatemask 0x1 interval2
    
       Note:
       interval1 = interval for a 64 bytes packet to go through
       interval2 = cycle-time - interval1
    
    2. Take tcpdump on listener board
    
    3. Use udp tai app on talker to send packets to listener
    
    4. Check the timestamp on listener via wireshark
    
    Test Result:
    100 Mbps: 113 ~193 ns
    1000 Mbps: 52 ~ 84 ns
    2500 Mbps: 95 ~ 223 ns
    
    Note that the test result is similar to the patch "igc: Correct the
    launchtime offset".
    
    Fixes: 790835fcc0cb ("igc: Correct the launchtime offset")
    Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
    Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
    Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

igc: Fix qbv_config_change_errors logics [+ + +]

Author: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
Date:   Sun Jul 7 08:53:16 2024 -0400

    igc: Fix qbv_config_change_errors logics
    
    [ Upstream commit f8d6acaee9d35cbff3c3cfad94641666c596f8da ]
    
    When user issues these cmds:
    1. Either a) or b)
       a) mqprio with hardware offload disabled
       b) taprio with txtime-assist feature enabled
    2. etf
    3. tc qdisc delete
    4. taprio with base time in the past
    
    At step 4, qbv_config_change_errors wrongly increased by 1.
    
    Excerpt from IEEE 802.1Q-2018 8.6.9.3.1:
    "If AdminBaseTime specifies a time in the past, and the current schedule
    is running, then: Increment ConfigChangeError counter"
    
    qbv_config_change_errors should only increase if base time is in the past
    and no taprio is active. In user perspective, taprio was not active when
    first triggered at step 4. However, i225/6 reuses qbv for etf, so qbv is
    enabled with a dummy schedule at step 2 where it enters
    igc_tsn_enable_offload() and qbv_count got incremented to 1. At step 4, it
    enters igc_tsn_enable_offload() again, qbv_count is incremented to 2.
    Because taprio is running, tc_setup_type is TC_SETUP_QDISC_ETF and
    qbv_count > 1, qbv_config_change_errors value got incremented.
    
    This issue happens due to reliance on qbv_count field where a non-zero
    value indicates that taprio is running. But qbv_count increases
    regardless if taprio is triggered by user or by other tsn feature. It does
    not align with qbv_config_change_errors expectation where it is only
    concerned with taprio triggered by user.
    
    Fixing this by relocating the qbv_config_change_errors logic to
    igc_save_qbv_schedule(), eliminating reliance on qbv_count and its
    inaccuracies from i225/6's multiple uses of qbv feature for other TSN
    features.
    
    The new function created: igc_tsn_is_taprio_activated_by_user() uses
    taprio_offload_enable field to indicate that the current running taprio
    was triggered by user, instead of triggered by non-qbv feature like etf.
    
    Fixes: ae4fe4698300 ("igc: Add qbv_config_change_errors counter")
    Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
    Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
    Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

igc: Fix reset adapter logics when tx mode change [+ + +]

Author: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
Date:   Sun Jul 7 08:53:17 2024 -0400

    igc: Fix reset adapter logics when tx mode change
    
    [ Upstream commit 0afeaeb5dae86aceded0d5f0c3a54d27858c0c6f ]
    
    Following the "igc: Fix TX Hang issue when QBV Gate is close" changes,
    remaining issues with the reset adapter logic in igc_tsn_offload_apply()
    have been observed:
    
    1. The reset adapter logics for i225 and i226 differ, although they should
       be the same according to the guidelines in I225/6 HW Design Section
       7.5.2.1 on software initialization during tx mode changes.
    2. The i225 resets adapter every time, even though tx mode doesn't change.
       This occurs solely based on the condition  igc_is_device_id_i225() when
       calling schedule_work().
    3. i226 doesn't reset adapter for tsn->legacy tx mode changes. It only
       resets adapter for legacy->tsn tx mode transitions.
    4. qbv_count introduced in the patch is actually not needed; in this
       context, a non-zero value of qbv_count is used to indicate if tx mode
       was unconditionally set to tsn in igc_tsn_enable_offload(). This could
       be replaced by checking the existing register
       IGC_TQAVCTRL_TRANSMIT_MODE_TSN bit.
    
    This patch resolves all issues and enters schedule_work() to reset the
    adapter only when changing tx mode. It also removes reliance on qbv_count.
    
    qbv_count field will be removed in a future patch.
    
    Test ran:
    
    1. Verify reset adapter behaviour in i225/6:
       a) Enrol a new GCL
          Reset adapter observed (tx mode change legacy->tsn)
       b) Enrol a new GCL without deleting qdisc
          No reset adapter observed (tx mode remain tsn->tsn)
       c) Delete qdisc
          Reset adapter observed (tx mode change tsn->legacy)
    
    2. Tested scenario from "igc: Fix TX Hang issue when QBV Gate is closed"
       to confirm it remains resolved.
    
    Fixes: 175c241288c0 ("igc: Fix TX Hang issue when QBV Gate is closed")
    Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
    Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
    Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

Input: i8042 - add forcenorestore quirk to leave controller untouched even on s3 [+ + +]

Author: Werner Sembach <wse@tuxedocomputers.com>
Date:   Thu Jan 4 19:31:17 2024 +0100

    Input: i8042 - add forcenorestore quirk to leave controller untouched even on s3
    
    commit 3d765ae2daccc570b3f4fbcb57eb321b12cdded2 upstream.
    
    On s3 resume the i8042 driver tries to restore the controller to a known
    state by reinitializing things, however this can confuse the controller
    with different effects. Mostly occasionally unresponsive keyboards after
    resume.
    
    These issues do not rise on s0ix resume as here the controller is assumed
    to preserved its state from before suspend.
    
    This patch adds a quirk for devices where the reinitialization on s3 resume
    is not needed and might be harmful as described above. It does this by
    using the s0ix resume code path at selected locations.
    
    This new quirk goes beyond what the preexisting reset=never quirk does,
    which only skips some reinitialization steps.
    
    Signed-off-by: Werner Sembach <wse@tuxedocomputers.com>
    Cc: stable@vger.kernel.org
    Reviewed-by: Hans de Goede <hdegoede@redhat.com>
    Link: https://lore.kernel.org/r/20240104183118.779778-2-wse@tuxedocomputers.com
    Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Input: i8042 - use new forcenorestore quirk to replace old buggy quirk combination [+ + +]

Author: Werner Sembach <wse@tuxedocomputers.com>
Date:   Thu Jan 4 19:31:18 2024 +0100

    Input: i8042 - use new forcenorestore quirk to replace old buggy quirk combination
    
    commit aaa4ca873d3da768896ffc909795359a01e853ef upstream.
    
    The old quirk combination sometimes cause a laggy keyboard after boot. With
    the new quirk the initial issue of an unresponsive keyboard after s3 resume
    is also fixed, but it doesn't have the negative side effect of the
    sometimes laggy keyboard.
    
    Signed-off-by: Werner Sembach <wse@tuxedocomputers.com>
    Cc: stable@vger.kernel.org
    Reviewed-by: Hans de Goede <hdegoede@redhat.com>
    Link: https://lore.kernel.org/r/20240104183118.779778-3-wse@tuxedocomputers.com
    Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Input: MT - limit max slots [+ + +]

Author: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date:   Mon Jul 29 21:51:30 2024 +0900

    Input: MT - limit max slots
    
    commit 99d3bf5f7377d42f8be60a6b9cb60fb0be34dceb upstream.
    
    syzbot is reporting too large allocation at input_mt_init_slots(), for
    num_slots is supplied from userspace using ioctl(UI_DEV_CREATE).
    
    Since nobody knows possible max slots, this patch chose 1024.
    
    Reported-by: syzbot <syzbot+0122fa359a69694395d5@syzkaller.appspotmail.com>
    Closes: https://syzkaller.appspot.com/bug?extid=0122fa359a69694395d5
    Suggested-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
    Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: George Kennedy <george.kennedy@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

io_uring/kbuf: sanitize peek buffer setup [+ + +]

Author: Jens Axboe <axboe@kernel.dk>
Date:   Tue Aug 20 18:31:58 2024 -0600

    io_uring/kbuf: sanitize peek buffer setup
    
    [ Upstream commit e0ee967630c8ee67bb47a5b38d235cd5a8789c48 ]
    
    Harden the buffer peeking a bit, by adding a sanity check for it having
    a valid size. Outside of that, arg->max_len is a size_t, though it's
    only ever set to a 32-bit value (as it's governed by MAX_RW_COUNT).
    Bump our needed check to a size_t so we know it fits. Finally, cap the
    calculated needed iov value to the PEEK_MAX_IMPORT, which is the
    maximum number of segments that should be peeked.
    
    Fixes: 35c8711c8fc4 ("io_uring/kbuf: add helpers for getting/peeking multiple buffers")
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

io_uring/napi: check napi_enabled in io_napi_add() before proceeding [+ + +]

Author: Olivier Langlois <olivier@trillion01.com>
Date:   Sun Aug 11 14:07:11 2024 -0400

    io_uring/napi: check napi_enabled in io_napi_add() before proceeding
    
    [ Upstream commit 84f2eecf95018386c145ada19bb45b03bdb80d9e ]
    
    doing so avoids the overhead of adding napi ids to all the rings that do
    not enable napi.
    
    if no id is added to napi_list because napi is disabled,
    __io_napi_busy_loop() will not be called.
    
    Signed-off-by: Olivier Langlois <olivier@trillion01.com>
    Fixes: b4ccc4dd1330 ("io_uring/napi: enable even with a timeout of 0")
    Link: https://lore.kernel.org/r/bd989ccef5fda14f5fd9888faf4fefcf66bd0369.1723400131.git.olivier@trillion01.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

io_uring/napi: Remove unnecessary s64 cast [+ + +]

Author: Thorsten Blum <thorsten.blum@toblux.com>
Date:   Wed Jul 10 03:05:21 2024 +0200

    io_uring/napi: Remove unnecessary s64 cast
    
    [ Upstream commit f7c696a56cc7d70515774a24057b473757ec6089 ]
    
    Since the do_div() macro casts the divisor to u32 anyway, remove the
    unnecessary s64 cast and fix the following Coccinelle/coccicheck
    warning reported by do_div.cocci:
    
      WARNING: do_div() does a 64-by-32 division, please consider using div64_s64 instead
    
    Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
    Link: https://lore.kernel.org/r/20240710010520.384009-2-thorsten.blum@toblux.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Stable-dep-of: 84f2eecf9501 ("io_uring/napi: check napi_enabled in io_napi_add() before proceeding")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

io_uring/napi: use ktime in busy polling [+ + +]

Author: Pavel Begunkov <asml.silence@gmail.com>
Date:   Fri Jul 26 15:24:30 2024 +0100

    io_uring/napi: use ktime in busy polling
    
    [ Upstream commit 342b2e395d5f34c9f111a818556e617939f83a8c ]
    
    It's more natural to use ktime/ns instead of keeping around usec,
    especially since we're comparing it against user provided timers,
    so convert napi busy poll internal handling to ktime. It's also nicer
    since the type (ktime_t vs unsigned long) now tells the unit of measure.
    
    Keep everything as ktime, which we convert to/from micro seconds for
    IORING_[UN]REGISTER_NAPI. The net/ busy polling works seems to work with
    usec, however it's not real usec as shift by 10 is used to get it from
    nsecs, see busy_loop_current_time(), so it's easy to get truncated nsec
    back and we get back better precision.
    
    Note, we can further improve it later by removing the truncation and
    maybe convincing net/ to use ktime/ns instead.
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/95e7ec8d095069a3ed5d40a4bc6f8b586698bc7e.1722003776.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Stable-dep-of: 84f2eecf9501 ("io_uring/napi: check napi_enabled in io_napi_add() before proceeding")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

iommu: Restore lost return in iommu_report_device_fault() [+ + +]

Author: Barak Biber <bbiber@nvidia.com>
Date:   Thu Aug 1 09:26:04 2024 -0300

    iommu: Restore lost return in iommu_report_device_fault()
    
    [ Upstream commit fca5b78511e98bdff2cdd55c172b23200a7b3404 ]
    
    When iommu_report_device_fault gets called with a partial fault it is
    supposed to collect the fault into the group and then return.
    
    Instead the return was accidently deleted which results in trying to
    process the fault and an eventual crash.
    
    Deleting the return was a typo, put it back.
    
    Fixes: 3dfa64aecbaf ("iommu: Make iommu_report_device_fault() return void")
    Signed-off-by: Barak Biber <bbiber@nvidia.com>
    Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
    Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
    Link: https://lore.kernel.org/r/0-v1-e7153d9c8cee+1c6-iommu_fault_fix_jgg@nvidia.com
    Signed-off-by: Joerg Roedel <jroedel@suse.de>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

iommufd/device: Fix hwpt at err_unresv in iommufd_device_do_replace() [+ + +]

Author: Nicolin Chen <nicolinc@nvidia.com>
Date:   Wed Jul 17 22:01:30 2024 -0700

    iommufd/device: Fix hwpt at err_unresv in iommufd_device_do_replace()
    
    commit 950aeefb34923fe3c28ade35fe05f24e2c5b1d55 upstream.
    
    The rewind routine should remove the reserved iovas added to the new hwpt.
    
    Fixes: 89db31635c87 ("iommufd: Derive iommufd_hwpt_paging from iommufd_hw_pagetable")
    Cc: stable@vger.kernel.org
    Link: https://patch.msgid.link/r/20240718050130.1956804-1-nicolinc@nvidia.com
    Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
    Reviewed-by: Kevin Tian <kevin.tian@intel.com>
    Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ip6_tunnel: Fix broken GRO [+ + +]

Author: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Date:   Thu Aug 15 17:14:16 2024 +0200

    ip6_tunnel: Fix broken GRO
    
    [ Upstream commit 4b3e33fcc38f7750604b065c55a43e94c5bc3145 ]
    
    GRO code checks for matching layer 2 headers to see, if packet belongs
    to the same flow and because ip6 tunnel set dev->hard_header_len
    this check fails in cases, where it shouldn't. To fix this don't
    set hard_header_len, but use needed_headroom like ipv4/ip_tunnel.c
    does.
    
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
    Link: https://patch.msgid.link/20240815151419.109864-1-tbogendoerfer@suse.de
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ipv6: fix possible UAF in ip6_finish_output2() [+ + +]

Author: Eric Dumazet <edumazet@google.com>
Date:   Tue Aug 20 16:08:58 2024 +0000

    ipv6: fix possible UAF in ip6_finish_output2()
    
    [ Upstream commit da273b377ae0d9bd255281ed3c2adb228321687b ]
    
    If skb_expand_head() returns NULL, skb has been freed
    and associated dst/idev could also have been freed.
    
    We need to hold rcu_read_lock() to make sure the dst and
    associated idev are alive.
    
    Fixes: 5796015fa968 ("ipv6: allocate enough headroom in ip6_finish_output2()")
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Cc: Vasily Averin <vasily.averin@linux.dev>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Link: https://patch.msgid.link/20240820160859.3786976-3-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ipv6: prevent possible UAF in ip6_xmit() [+ + +]

Author: Eric Dumazet <edumazet@google.com>
Date:   Tue Aug 20 16:08:59 2024 +0000

    ipv6: prevent possible UAF in ip6_xmit()
    
    [ Upstream commit 2d5ff7e339d04622d8282661df36151906d0e1c7 ]
    
    If skb_expand_head() returns NULL, skb has been freed
    and the associated dst/idev could also have been freed.
    
    We must use rcu_read_lock() to prevent a possible UAF.
    
    Fixes: 0c9f227bee11 ("ipv6: use skb_expand_head in ip6_xmit")
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Cc: Vasily Averin <vasily.averin@linux.dev>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Link: https://patch.msgid.link/20240820160859.3786976-4-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ipv6: prevent UAF in ip6_send_skb() [+ + +]

Author: Eric Dumazet <edumazet@google.com>
Date:   Tue Aug 20 16:08:57 2024 +0000

    ipv6: prevent UAF in ip6_send_skb()
    
    [ Upstream commit faa389b2fbaaec7fd27a390b4896139f9da662e3 ]
    
    syzbot reported an UAF in ip6_send_skb() [1]
    
    After ip6_local_out() has returned, we no longer can safely
    dereference rt, unless we hold rcu_read_lock().
    
    A similar issue has been fixed in commit
    a688caa34beb ("ipv6: take rcu lock in rawv6_send_hdrinc()")
    
    Another potential issue in ip6_finish_output2() is handled in a
    separate patch.
    
    [1]
     BUG: KASAN: slab-use-after-free in ip6_send_skb+0x18d/0x230 net/ipv6/ip6_output.c:1964
    Read of size 8 at addr ffff88806dde4858 by task syz.1.380/6530
    
    CPU: 1 UID: 0 PID: 6530 Comm: syz.1.380 Not tainted 6.11.0-rc3-syzkaller-00306-gdf6cbc62cc9b #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024
    Call Trace:
     <TASK>
      __dump_stack lib/dump_stack.c:93 [inline]
      dump_stack_lvl+0x241/0x360 lib/dump_stack.c:119
      print_address_description mm/kasan/report.c:377 [inline]
      print_report+0x169/0x550 mm/kasan/report.c:488
      kasan_report+0x143/0x180 mm/kasan/report.c:601
      ip6_send_skb+0x18d/0x230 net/ipv6/ip6_output.c:1964
      rawv6_push_pending_frames+0x75c/0x9e0 net/ipv6/raw.c:588
      rawv6_sendmsg+0x19c7/0x23c0 net/ipv6/raw.c:926
      sock_sendmsg_nosec net/socket.c:730 [inline]
      __sock_sendmsg+0x1a6/0x270 net/socket.c:745
      sock_write_iter+0x2dd/0x400 net/socket.c:1160
     do_iter_readv_writev+0x60a/0x890
      vfs_writev+0x37c/0xbb0 fs/read_write.c:971
      do_writev+0x1b1/0x350 fs/read_write.c:1018
      do_syscall_x64 arch/x86/entry/common.c:52 [inline]
      do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
     entry_SYSCALL_64_after_hwframe+0x77/0x7f
    RIP: 0033:0x7f936bf79e79
    Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
    RSP: 002b:00007f936cd7f038 EFLAGS: 00000246 ORIG_RAX: 0000000000000014
    RAX: ffffffffffffffda RBX: 00007f936c115f80 RCX: 00007f936bf79e79
    RDX: 0000000000000001 RSI: 0000000020000040 RDI: 0000000000000004
    RBP: 00007f936bfe7916 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
    R13: 0000000000000000 R14: 00007f936c115f80 R15: 00007fff2860a7a8
     </TASK>
    
    Allocated by task 6530:
      kasan_save_stack mm/kasan/common.c:47 [inline]
      kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
      unpoison_slab_object mm/kasan/common.c:312 [inline]
      __kasan_slab_alloc+0x66/0x80 mm/kasan/common.c:338
      kasan_slab_alloc include/linux/kasan.h:201 [inline]
      slab_post_alloc_hook mm/slub.c:3988 [inline]
      slab_alloc_node mm/slub.c:4037 [inline]
      kmem_cache_alloc_noprof+0x135/0x2a0 mm/slub.c:4044
      dst_alloc+0x12b/0x190 net/core/dst.c:89
      ip6_blackhole_route+0x59/0x340 net/ipv6/route.c:2670
      make_blackhole net/xfrm/xfrm_policy.c:3120 [inline]
      xfrm_lookup_route+0xd1/0x1c0 net/xfrm/xfrm_policy.c:3313
      ip6_dst_lookup_flow+0x13e/0x180 net/ipv6/ip6_output.c:1257
      rawv6_sendmsg+0x1283/0x23c0 net/ipv6/raw.c:898
      sock_sendmsg_nosec net/socket.c:730 [inline]
      __sock_sendmsg+0x1a6/0x270 net/socket.c:745
      ____sys_sendmsg+0x525/0x7d0 net/socket.c:2597
      ___sys_sendmsg net/socket.c:2651 [inline]
      __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2680
      do_syscall_x64 arch/x86/entry/common.c:52 [inline]
      do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
     entry_SYSCALL_64_after_hwframe+0x77/0x7f
    
    Freed by task 45:
      kasan_save_stack mm/kasan/common.c:47 [inline]
      kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
      kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:579
      poison_slab_object+0xe0/0x150 mm/kasan/common.c:240
      __kasan_slab_free+0x37/0x60 mm/kasan/common.c:256
      kasan_slab_free include/linux/kasan.h:184 [inline]
      slab_free_hook mm/slub.c:2252 [inline]
      slab_free mm/slub.c:4473 [inline]
      kmem_cache_free+0x145/0x350 mm/slub.c:4548
      dst_destroy+0x2ac/0x460 net/core/dst.c:124
      rcu_do_batch kernel/rcu/tree.c:2569 [inline]
      rcu_core+0xafd/0x1830 kernel/rcu/tree.c:2843
      handle_softirqs+0x2c4/0x970 kernel/softirq.c:554
      __do_softirq kernel/softirq.c:588 [inline]
      invoke_softirq kernel/softirq.c:428 [inline]
      __irq_exit_rcu+0xf4/0x1c0 kernel/softirq.c:637
      irq_exit_rcu+0x9/0x30 kernel/softirq.c:649
      instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline]
      sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1043
      asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702
    
    Last potentially related work creation:
      kasan_save_stack+0x3f/0x60 mm/kasan/common.c:47
      __kasan_record_aux_stack+0xac/0xc0 mm/kasan/generic.c:541
      __call_rcu_common kernel/rcu/tree.c:3106 [inline]
      call_rcu+0x167/0xa70 kernel/rcu/tree.c:3210
      refdst_drop include/net/dst.h:263 [inline]
      skb_dst_drop include/net/dst.h:275 [inline]
      nf_ct_frag6_queue net/ipv6/netfilter/nf_conntrack_reasm.c:306 [inline]
      nf_ct_frag6_gather+0xb9a/0x2080 net/ipv6/netfilter/nf_conntrack_reasm.c:485
      ipv6_defrag+0x2c8/0x3c0 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:67
      nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline]
      nf_hook_slow+0xc3/0x220 net/netfilter/core.c:626
      nf_hook include/linux/netfilter.h:269 [inline]
      __ip6_local_out+0x6fa/0x800 net/ipv6/output_core.c:143
      ip6_local_out+0x26/0x70 net/ipv6/output_core.c:153
      ip6_send_skb+0x112/0x230 net/ipv6/ip6_output.c:1959
      rawv6_push_pending_frames+0x75c/0x9e0 net/ipv6/raw.c:588
      rawv6_sendmsg+0x19c7/0x23c0 net/ipv6/raw.c:926
      sock_sendmsg_nosec net/socket.c:730 [inline]
      __sock_sendmsg+0x1a6/0x270 net/socket.c:745
      sock_write_iter+0x2dd/0x400 net/socket.c:1160
     do_iter_readv_writev+0x60a/0x890
    
    Fixes: 0625491493d9 ("ipv6: ip6_push_pending_frames() should increment IPSTATS_MIB_OUTDISCARDS")
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reported-by: syzbot <syzkaller@googlegroups.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Link: https://patch.msgid.link/20240820160859.3786976-2-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

kallsyms: Do not cleanup .llvm. suffix before sorting symbols [+ + +]

Author: Song Liu <song@kernel.org>
Date:   Wed Aug 7 15:05:12 2024 -0700

    kallsyms: Do not cleanup .llvm.<hash> suffix before sorting symbols
    
    [ Upstream commit 020925ce92990c3bf59ab2cde386ac6d9ec734ff ]
    
    Cleaning up the symbols causes various issues afterwards. Let's sort
    the list based on original name.
    
    Signed-off-by: Song Liu <song@kernel.org>
    Fixes: 8cc32a9bbf29 ("kallsyms: strip LTO-only suffixes from promoted global functions")
    Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
    Tested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
    Acked-by: Petr Mladek <pmladek@suse.com>
    Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
    Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
    Link: https://lore.kernel.org/r/20240807220513.3100483-2-song@kernel.org
    Signed-off-by: Kees Cook <kees@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

kallsyms: get rid of code for absolute kallsyms [+ + +]

Author: Jann Horn <jannh@google.com>
Date:   Wed Feb 21 21:26:53 2024 +0100

    kallsyms: get rid of code for absolute kallsyms
    
    [ Upstream commit 64e166099b69bfc09f667253358a15160b86ea43 ]
    
    Commit cf8e8658100d ("arch: Remove Itanium (IA-64) architecture")
    removed the last use of the absolute kallsyms.
    
    Signed-off-by: Jann Horn <jannh@google.com>
    Acked-by: Arnd Bergmann <arnd@arndb.de>
    Link: https://lore.kernel.org/all/20240221202655.2423854-1-jannh@google.com/
    [masahiroy@kernel.org: rebase the code and reword the commit description]
    Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
    Stable-dep-of: 020925ce9299 ("kallsyms: Do not cleanup .llvm.<hash> suffix before sorting symbols")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

kallsyms: Match symbols exactly with CONFIG_LTO_CLANG [+ + +]

Author: Song Liu <song@kernel.org>
Date:   Wed Aug 7 15:05:13 2024 -0700

    kallsyms: Match symbols exactly with CONFIG_LTO_CLANG
    
    [ Upstream commit fb6a421fb6153d97cf3058f9bd550b377b76a490 ]
    
    With CONFIG_LTO_CLANG=y, the compiler may add .llvm.<hash> suffix to
    function names to avoid duplication. APIs like kallsyms_lookup_name()
    and kallsyms_on_each_match_symbol() tries to match these symbol names
    without the .llvm.<hash> suffix, e.g., match "c_stop" with symbol
    c_stop.llvm.17132674095431275852. This turned out to be problematic
    for use cases that require exact match, for example, livepatch.
    
    Fix this by making the APIs to match symbols exactly.
    
    Also cleanup kallsyms_selftests accordingly.
    
    Signed-off-by: Song Liu <song@kernel.org>
    Fixes: 8cc32a9bbf29 ("kallsyms: strip LTO-only suffixes from promoted global functions")
    Tested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
    Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
    Acked-by: Petr Mladek <pmladek@suse.com>
    Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
    Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
    Link: https://lore.kernel.org/r/20240807220513.3100483-3-song@kernel.org
    Signed-off-by: Kees Cook <kees@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

kbuild: avoid scripts/kallsyms parsing /dev/null [+ + +]

Author: Masahiro Yamada <masahiroy@kernel.org>
Date:   Thu Aug 8 03:03:00 2024 +0900

    kbuild: avoid scripts/kallsyms parsing /dev/null
    
    [ Upstream commit 1472464c6248575bf2d01c7f076b94704bb32c95 ]
    
    On macOS, as reported by Daniel Gomez, getline() sets ENOTTY to errno
    if it is requested to read from /dev/null.
    
    If this is worth fixing, I would rather pass an empty file to
    scripts/kallsyms instead of adding the ugly #ifdef __APPLE__.
    
    Fixes: c442db3f49f2 ("kbuild: remove PROVIDE() for kallsyms symbols")
    Reported-by: Daniel Gomez <da.gomez@samsung.com>
    Closes: https://lore.kernel.org/all/20240807-macos-build-support-v1-12-4cd1ded85694@samsung.com/
    Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
    Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
    Reviewed-by: Daniel Gomez <da.gomez@samsung.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

kbuild: merge temporary vmlinux for BTF and kallsyms [+ + +]

Author: Masahiro Yamada <masahiroy@kernel.org>
Date:   Mon Jun 10 20:25:18 2024 +0900

    kbuild: merge temporary vmlinux for BTF and kallsyms
    
    [ Upstream commit b1a9a5e04767e2a78783e19c9e55c25812ceccc3 ]
    
    CONFIG_DEBUG_INFO_BTF=y requires one additional link step.
    (.tmp_vmlinux.btf)
    
    CONFIG_KALLSYMS=y requires two additional link steps.
    (.tmp_vmlinux.kallsyms1 and .tmp_vmlinux.kallsyms2)
    
    Enabling both requires three additional link steps.
    
    When CONFIG_DEBUG_INFO_BTF=y and CONFIG_KALLSYMS=y, the current build
    process is as follows:
    
        KSYMS   .tmp_vmlinux.kallsyms0.S
        AS      .tmp_vmlinux.kallsyms0.o
        LD      .tmp_vmlinux.btf             # temporary vmlinux for BTF
        BTF     .btf.vmlinux.bin.o
        LD      .tmp_vmlinux.kallsyms1       # temporary vmlinux for kallsyms step 1
        NM      .tmp_vmlinux.kallsyms1.syms
        KSYMS   .tmp_vmlinux.kallsyms1.S
        AS      .tmp_vmlinux.kallsyms1.o
        LD      .tmp_vmlinux.kallsyms2       # temporary vmlinux for kallsyms step 2
        NM      .tmp_vmlinux.kallsyms2.syms
        KSYMS   .tmp_vmlinux.kallsyms2.S
        AS      .tmp_vmlinux.kallsyms2.o
        LD      vmlinux                      # final vmlinux
    
    This is redundant because the BTF generation and the kallsyms step 1 can
    be performed against the same temporary vmlinux.
    
    When both CONFIG_DEBUG_INFO_BTF and CONFIG_KALLSYMS are enabled, we can
    reduce the number of link steps by one.
    
    This commit changes the build process as follows:
    
        KSYMS   .tmp_vmlinux0.kallsyms.S
        AS      .tmp_vmlinux0.kallsyms.o
        LD      .tmp_vmlinux1                # temporary vmlinux for BTF and kallsyms step 1
        BTF     .tmp_vmlinux1.btf.o
        NM      .tmp_vmlinux1.syms
        KSYMS   .tmp_vmlinux1.kallsyms.S
        AS      .tmp_vmlinux1.kallsyms.o
        LD      .tmp_vmlinux2                # temporary vmlinux for kallsyms step 2
        NM      .tmp_vmlinux2.syms
        KSYMS   .tmp_vmlinux2.kallsyms.S
        AS      .tmp_vmlinux2.kallsyms.o
        LD      vmlinux                      # final vmlinux
    
    Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
    Acked-by: Andrii Nakryiko <andrii@kernel.org>
    Stable-dep-of: 1472464c6248 ("kbuild: avoid scripts/kallsyms parsing /dev/null")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

kbuild: refactor variables in scripts/link-vmlinux.sh [+ + +]

Author: Masahiro Yamada <masahiroy@kernel.org>
Date:   Mon Jun 10 20:25:16 2024 +0900

    kbuild: refactor variables in scripts/link-vmlinux.sh
    
    [ Upstream commit ddf41329839f49dadf26973cd845ea160ac1784d ]
    
    Clean up the variables in scripts/link-vmlinux.sh
    
     - Specify the extra objects directly in vmlinux_link()
     - Move the AS rule to kallsyms()
     - Set kallsymso and btf_vmlinux_bin_o where they are generated
     - Remove unneeded variable, kallsymso_prev
     - Introduce the btf_data variable
     - Introduce the strip_debug flag instead of checking the output name
    
    No functional change intended.
    
    Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
    Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
    Stable-dep-of: 020925ce9299 ("kallsyms: Do not cleanup .llvm.<hash> suffix before sorting symbols")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

kbuild: remove PROVIDE() for kallsyms symbols [+ + +]

Author: Masahiro Yamada <masahiroy@kernel.org>
Date:   Mon Jun 10 20:25:17 2024 +0900

    kbuild: remove PROVIDE() for kallsyms symbols
    
    [ Upstream commit c442db3f49f27e5a60a641b2ac9a3c6320796ed6 ]
    
    This reimplements commit 951bcae6c5a0 ("kallsyms: Avoid weak references
    for kallsyms symbols") because I am not a big fan of PROVIDE().
    
    As an alternative solution, this commit prepends one more kallsyms step.
    
        KSYMS   .tmp_vmlinux.kallsyms0.S          # added
        AS      .tmp_vmlinux.kallsyms0.o          # added
        LD      .tmp_vmlinux.btf
        BTF     .btf.vmlinux.bin.o
        LD      .tmp_vmlinux.kallsyms1
        NM      .tmp_vmlinux.kallsyms1.syms
        KSYMS   .tmp_vmlinux.kallsyms1.S
        AS      .tmp_vmlinux.kallsyms1.o
        LD      .tmp_vmlinux.kallsyms2
        NM      .tmp_vmlinux.kallsyms2.syms
        KSYMS   .tmp_vmlinux.kallsyms2.S
        AS      .tmp_vmlinux.kallsyms2.o
        LD      vmlinux
    
    Step 0 takes /dev/null as input, and generates .tmp_vmlinux.kallsyms0.o,
    which has a valid kallsyms format with the empty symbol list, and can be
    linked to vmlinux. Since it is really small, the added compile-time cost
    is negligible.
    
    Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
    Acked-by: Ard Biesheuvel <ardb@kernel.org>
    Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
    Stable-dep-of: 020925ce9299 ("kallsyms: Do not cleanup .llvm.<hash> suffix before sorting symbols")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

kcm: Serialise kcm_sendmsg() for the same socket. [+ + +]

Author: Kuniyuki Iwashima <kuniyu@amazon.com>
Date:   Thu Aug 15 15:04:37 2024 -0700

    kcm: Serialise kcm_sendmsg() for the same socket.
    
    [ Upstream commit 807067bf014d4a3ae2cc55bd3de16f22a01eb580 ]
    
    syzkaller reported UAF in kcm_release(). [0]
    
    The scenario is
    
      1. Thread A builds a skb with MSG_MORE and sets kcm->seq_skb.
    
      2. Thread A resumes building skb from kcm->seq_skb but is blocked
         by sk_stream_wait_memory()
    
      3. Thread B calls sendmsg() concurrently, finishes building kcm->seq_skb
         and puts the skb to the write queue
    
      4. Thread A faces an error and finally frees skb that is already in the
         write queue
    
      5. kcm_release() does double-free the skb in the write queue
    
    When a thread is building a MSG_MORE skb, another thread must not touch it.
    
    Let's add a per-sk mutex and serialise kcm_sendmsg().
    
    [0]:
    BUG: KASAN: slab-use-after-free in __skb_unlink include/linux/skbuff.h:2366 [inline]
    BUG: KASAN: slab-use-after-free in __skb_dequeue include/linux/skbuff.h:2385 [inline]
    BUG: KASAN: slab-use-after-free in __skb_queue_purge_reason include/linux/skbuff.h:3175 [inline]
    BUG: KASAN: slab-use-after-free in __skb_queue_purge include/linux/skbuff.h:3181 [inline]
    BUG: KASAN: slab-use-after-free in kcm_release+0x170/0x4c8 net/kcm/kcmsock.c:1691
    Read of size 8 at addr ffff0000ced0fc80 by task syz-executor329/6167
    
    CPU: 1 PID: 6167 Comm: syz-executor329 Tainted: G    B              6.8.0-rc5-syzkaller-g9abbc24128bc #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024
    Call trace:
     dump_backtrace+0x1b8/0x1e4 arch/arm64/kernel/stacktrace.c:291
     show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:298
     __dump_stack lib/dump_stack.c:88 [inline]
     dump_stack_lvl+0xd0/0x124 lib/dump_stack.c:106
     print_address_description mm/kasan/report.c:377 [inline]
     print_report+0x178/0x518 mm/kasan/report.c:488
     kasan_report+0xd8/0x138 mm/kasan/report.c:601
     __asan_report_load8_noabort+0x20/0x2c mm/kasan/report_generic.c:381
     __skb_unlink include/linux/skbuff.h:2366 [inline]
     __skb_dequeue include/linux/skbuff.h:2385 [inline]
     __skb_queue_purge_reason include/linux/skbuff.h:3175 [inline]
     __skb_queue_purge include/linux/skbuff.h:3181 [inline]
     kcm_release+0x170/0x4c8 net/kcm/kcmsock.c:1691
     __sock_release net/socket.c:659 [inline]
     sock_close+0xa4/0x1e8 net/socket.c:1421
     __fput+0x30c/0x738 fs/file_table.c:376
     ____fput+0x20/0x30 fs/file_table.c:404
     task_work_run+0x230/0x2e0 kernel/task_work.c:180
     exit_task_work include/linux/task_work.h:38 [inline]
     do_exit+0x618/0x1f64 kernel/exit.c:871
     do_group_exit+0x194/0x22c kernel/exit.c:1020
     get_signal+0x1500/0x15ec kernel/signal.c:2893
     do_signal+0x23c/0x3b44 arch/arm64/kernel/signal.c:1249
     do_notify_resume+0x74/0x1f4 arch/arm64/kernel/entry-common.c:148
     exit_to_user_mode_prepare arch/arm64/kernel/entry-common.c:169 [inline]
     exit_to_user_mode arch/arm64/kernel/entry-common.c:178 [inline]
     el0_svc+0xac/0x168 arch/arm64/kernel/entry-common.c:713
     el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730
     el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598
    
    Allocated by task 6166:
     kasan_save_stack mm/kasan/common.c:47 [inline]
     kasan_save_track+0x40/0x78 mm/kasan/common.c:68
     kasan_save_alloc_info+0x70/0x84 mm/kasan/generic.c:626
     unpoison_slab_object mm/kasan/common.c:314 [inline]
     __kasan_slab_alloc+0x74/0x8c mm/kasan/common.c:340
     kasan_slab_alloc include/linux/kasan.h:201 [inline]
     slab_post_alloc_hook mm/slub.c:3813 [inline]
     slab_alloc_node mm/slub.c:3860 [inline]
     kmem_cache_alloc_node+0x204/0x4c0 mm/slub.c:3903
     __alloc_skb+0x19c/0x3d8 net/core/skbuff.c:641
     alloc_skb include/linux/skbuff.h:1296 [inline]
     kcm_sendmsg+0x1d3c/0x2124 net/kcm/kcmsock.c:783
     sock_sendmsg_nosec net/socket.c:730 [inline]
     __sock_sendmsg net/socket.c:745 [inline]
     sock_sendmsg+0x220/0x2c0 net/socket.c:768
     splice_to_socket+0x7cc/0xd58 fs/splice.c:889
     do_splice_from fs/splice.c:941 [inline]
     direct_splice_actor+0xec/0x1d8 fs/splice.c:1164
     splice_direct_to_actor+0x438/0xa0c fs/splice.c:1108
     do_splice_direct_actor fs/splice.c:1207 [inline]
     do_splice_direct+0x1e4/0x304 fs/splice.c:1233
     do_sendfile+0x460/0xb3c fs/read_write.c:1295
     __do_sys_sendfile64 fs/read_write.c:1362 [inline]
     __se_sys_sendfile64 fs/read_write.c:1348 [inline]
     __arm64_sys_sendfile64+0x160/0x3b4 fs/read_write.c:1348
     __invoke_syscall arch/arm64/kernel/syscall.c:37 [inline]
     invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:51
     el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:136
     do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:155
     el0_svc+0x54/0x168 arch/arm64/kernel/entry-common.c:712
     el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730
     el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598
    
    Freed by task 6167:
     kasan_save_stack mm/kasan/common.c:47 [inline]
     kasan_save_track+0x40/0x78 mm/kasan/common.c:68
     kasan_save_free_info+0x5c/0x74 mm/kasan/generic.c:640
     poison_slab_object+0x124/0x18c mm/kasan/common.c:241
     __kasan_slab_free+0x3c/0x78 mm/kasan/common.c:257
     kasan_slab_free include/linux/kasan.h:184 [inline]
     slab_free_hook mm/slub.c:2121 [inline]
     slab_free mm/slub.c:4299 [inline]
     kmem_cache_free+0x15c/0x3d4 mm/slub.c:4363
     kfree_skbmem+0x10c/0x19c
     __kfree_skb net/core/skbuff.c:1109 [inline]
     kfree_skb_reason+0x240/0x6f4 net/core/skbuff.c:1144
     kfree_skb include/linux/skbuff.h:1244 [inline]
     kcm_release+0x104/0x4c8 net/kcm/kcmsock.c:1685
     __sock_release net/socket.c:659 [inline]
     sock_close+0xa4/0x1e8 net/socket.c:1421
     __fput+0x30c/0x738 fs/file_table.c:376
     ____fput+0x20/0x30 fs/file_table.c:404
     task_work_run+0x230/0x2e0 kernel/task_work.c:180
     exit_task_work include/linux/task_work.h:38 [inline]
     do_exit+0x618/0x1f64 kernel/exit.c:871
     do_group_exit+0x194/0x22c kernel/exit.c:1020
     get_signal+0x1500/0x15ec kernel/signal.c:2893
     do_signal+0x23c/0x3b44 arch/arm64/kernel/signal.c:1249
     do_notify_resume+0x74/0x1f4 arch/arm64/kernel/entry-common.c:148
     exit_to_user_mode_prepare arch/arm64/kernel/entry-common.c:169 [inline]
     exit_to_user_mode arch/arm64/kernel/entry-common.c:178 [inline]
     el0_svc+0xac/0x168 arch/arm64/kernel/entry-common.c:713
     el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730
     el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598
    
    The buggy address belongs to the object at ffff0000ced0fc80
     which belongs to the cache skbuff_head_cache of size 240
    The buggy address is located 0 bytes inside of
     freed 240-byte region [ffff0000ced0fc80, ffff0000ced0fd70)
    
    The buggy address belongs to the physical page:
    page:00000000d35f4ae4 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x10ed0f
    flags: 0x5ffc00000000800(slab|node=0|zone=2|lastcpupid=0x7ff)
    page_type: 0xffffffff()
    raw: 05ffc00000000800 ffff0000c1cbf640 fffffdffc3423100 dead000000000004
    raw: 0000000000000000 00000000000c000c 00000001ffffffff 0000000000000000
    page dumped because: kasan: bad access detected
    
    Memory state around the buggy address:
     ffff0000ced0fb80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
     ffff0000ced0fc00: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc
    >ffff0000ced0fc80: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                       ^
     ffff0000ced0fd00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc
     ffff0000ced0fd80: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb
    
    Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module")
    Reported-by: syzbot+b72d86aa5df17ce74c60@syzkaller.appspotmail.com
    Closes: https://syzkaller.appspot.com/bug?extid=b72d86aa5df17ce74c60
    Tested-by: syzbot+b72d86aa5df17ce74c60@syzkaller.appspotmail.com
    Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Link: https://patch.msgid.link/20240815220437.69511-1-kuniyu@amazon.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

KEYS: trusted: dcp: fix leak of blob encryption key [+ + +]

Author: David Gstir <david@sigma-star.at>
Date:   Wed Jul 17 13:28:45 2024 +0200

    KEYS: trusted: dcp: fix leak of blob encryption key
    
    commit 0e28bf61a5f9ab30be3f3b4eafb8d097e39446bb upstream.
    
    Trusted keys unseal the key blob on load, but keep the sealed payload in
    the blob field so that every subsequent read (export) will simply
    convert this field to hex and send it to userspace.
    
    With DCP-based trusted keys, we decrypt the blob encryption key (BEK)
    in the Kernel due hardware limitations and then decrypt the blob payload.
    BEK decryption is done in-place which means that the trusted key blob
    field is modified and it consequently holds the BEK in plain text.
    Every subsequent read of that key thus send the plain text BEK instead
    of the encrypted BEK to userspace.
    
    This issue only occurs when importing a trusted DCP-based key and
    then exporting it again. This should rarely happen as the common use cases
    are to either create a new trusted key and export it, or import a key
    blob and then just use it without exporting it again.
    
    Fix this by performing BEK decryption and encryption in a dedicated
    buffer. Further always wipe the plain text BEK buffer to prevent leaking
    the key via uninitialized memory.
    
    Cc: stable@vger.kernel.org # v6.10+
    Fixes: 2e8a0f40a39c ("KEYS: trusted: Introduce NXP DCP-backed trusted keys")
    Signed-off-by: David Gstir <david@sigma-star.at>
    Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
    Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

KEYS: trusted: fix DCP blob payload length assignment [+ + +]

Author: David Gstir <david@sigma-star.at>
Date:   Wed Jul 17 13:28:44 2024 +0200

    KEYS: trusted: fix DCP blob payload length assignment
    
    commit 6486cad00a8b7f8585983408c152bbe33dda529b upstream.
    
    The DCP trusted key type uses the wrong helper function to store
    the blob's payload length which can lead to the wrong byte order
    being used in case this would ever run on big endian architectures.
    
    Fix by using correct helper function.
    
    Cc: stable@vger.kernel.org # v6.10+
    Fixes: 2e8a0f40a39c ("KEYS: trusted: Introduce NXP DCP-backed trusted keys")
    Suggested-by: Richard Weinberger <richard@nod.at>
    Reported-by: kernel test robot <lkp@intel.com>
    Closes: https://lore.kernel.org/oe-kbuild-all/202405240610.fj53EK0q-lkp@intel.com/
    Signed-off-by: David Gstir <david@sigma-star.at>
    Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
    Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ksmbd: fix race condition between destroy_previous_session() and smb2 operations() [+ + +]

Author: Namjae Jeon <linkinjeon@kernel.org>
Date:   Tue Aug 27 09:27:56 2024 +0900

    ksmbd: fix race condition between destroy_previous_session() and smb2 operations()
    
    [ Upstream commit 76e98a158b207771a6c9a0de0a60522a446a3447 ]
    
    If there is ->PreviousSessionId field in the session setup request,
    The session of the previous connection should be destroyed.
    During this, if the smb2 operation requests in the previous session are
    being processed, a racy issue could happen with ksmbd_destroy_file_table().
    This patch sets conn->status to KSMBD_SESS_NEED_RECONNECT to block
    incoming  operations and waits until on-going operations are complete
    (i.e. idle) before desctorying the previous session.
    
    Fixes: c8efcc786146 ("ksmbd: add support for durable handles v1/v2")
    Cc: stable@vger.kernel.org # v6.6+
    Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-25040
    Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
    Signed-off-by: Steve French <stfrench@microsoft.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ksmbd: the buffer of smb2 query dir response has at least 1 byte [+ + +]

Author: Namjae Jeon <linkinjeon@kernel.org>
Date:   Tue Aug 20 22:07:38 2024 +0900

    ksmbd: the buffer of smb2 query dir response has at least 1 byte
    
    commit ce61b605a00502c59311d0a4b1f58d62b48272d0 upstream.
    
    When STATUS_NO_MORE_FILES status is set to smb2 query dir response,
    ->StructureSize is set to 9, which mean buffer has 1 byte.
    This issue occurs because ->Buffer[1] in smb2_query_directory_rsp to
    flex-array.
    
    Fixes: eb3e28c1e89b ("smb3: Replace smb2pdu 1-element arrays with flex-arrays")
    Cc: stable@vger.kernel.org # v6.1+
    Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
    Signed-off-by: Steve French <stfrench@microsoft.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

KVM: arm64: Make ICC_*SGI*_EL1 undef in the absence of a vGICv3 [+ + +]

Author: Marc Zyngier <maz@kernel.org>
Date:   Tue Aug 20 11:03:38 2024 +0100

    KVM: arm64: Make ICC_*SGI*_EL1 undef in the absence of a vGICv3
    
    commit 3e6245ebe7ef341639e9a7e402b3ade8ad45a19f upstream.
    
    On a system with a GICv3, if a guest hasn't been configured with
    GICv3 and that the host is not capable of GICv2 emulation,
    a write to any of the ICC_*SGI*_EL1 registers is trapped to EL2.
    
    We therefore try to emulate the SGI access, only to hit a NULL
    pointer as no private interrupt is allocated (no GIC, remember?).
    
    The obvious fix is to give the guest what it deserves, in the
    shape of a UNDEF exception.
    
    Reported-by: Alexander Potapenko <glider@google.com>
    Signed-off-by: Marc Zyngier <maz@kernel.org>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20240820100349.3544850-2-maz@kernel.org
    Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

KVM: arm64: vgic-debug: Don't put unmarked LPIs [+ + +]

Author: Zenghui Yu <yuzenghui@huawei.com>
Date:   Sat Aug 17 18:15:41 2024 +0800

    KVM: arm64: vgic-debug: Don't put unmarked LPIs
    
    commit 2240a50e6294214de791729e9dcba6880fa7e44e upstream.
    
    If there were LPIs being mapped behind our back (i.e., between .start() and
    .stop()), we would put them at iter_unmark_lpis() without checking if they
    were actually *marked*, which is obviously not good.
    
    Switch to use the xa_for_each_marked() iterator to fix it.
    
    Cc: stable@vger.kernel.org
    Fixes: 85d3ccc8b75b ("KVM: arm64: vgic-debug: Use an xarray mark for debug iterator")
    Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
    Reviewed-by: Marc Zyngier <maz@kernel.org>
    Link: https://lore.kernel.org/r/20240817101541.1664-1-yuzenghui@huawei.com
    Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

KVM: s390: fix validity interception issue when gisa is switched off [+ + +]

Author: Michael Mueller <mimu@linux.ibm.com>
Date:   Thu Aug 1 14:31:09 2024 +0200

    KVM: s390: fix validity interception issue when gisa is switched off
    
    commit 5a44bb061d04b0306f2aa8add761d86d152b9377 upstream.
    
    We might run into a SIE validity if gisa has been disabled either via using
    kernel parameter "kvm.use_gisa=0" or by setting the related sysfs
    attribute to N (echo N >/sys/module/kvm/parameters/use_gisa).
    
    The validity is caused by an invalid value in the SIE control block's
    gisa designation. That happens because we pass the uninitialized gisa
    origin to virt_to_phys() before writing it to the gisa designation.
    
    To fix this we return 0 in kvm_s390_get_gisa_desc() if the origin is 0.
    kvm_s390_get_gisa_desc() is used to determine which gisa designation to
    set in the SIE control block. A value of 0 in the gisa designation disables
    gisa usage.
    
    The issue surfaces in the host kernel with the following kernel message as
    soon a new kvm guest start is attemted.
    
    kvm: unhandled validity intercept 0x1011
    WARNING: CPU: 0 PID: 781237 at arch/s390/kvm/intercept.c:101 kvm_handle_sie_intercept+0x42e/0x4d0 [kvm]
    Modules linked in: vhost_net tap tun xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT xt_tcpudp nft_compat x_tables nf_nat_tftp nf_conntrack_tftp vfio_pci_core irqbypass vhost_vsock vmw_vsock_virtio_transport_common vsock vhost vhost_iotlb kvm nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables sunrpc mlx5_ib ib_uverbs ib_core mlx5_core uvdevice s390_trng eadm_sch vfio_ccw zcrypt_cex4 mdev vfio_iommu_type1 vfio sch_fq_codel drm i2c_core loop drm_panel_orientation_quirks configfs nfnetlink lcs ctcm fsm dm_service_time ghash_s390 prng chacha_s390 libchacha aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common dm_mirror dm_region_hash dm_log zfcp scsi_transport_fc scsi_dh_rdac scsi_dh_emc scsi_dh_alua pkey zcrypt dm_multipath rng_core autofs4 [last unloaded: vfio_pci]
    CPU: 0 PID: 781237 Comm: CPU 0/KVM Not tainted 6.10.0-08682-gcad9f11498ea #6
    Hardware name: IBM 3931 A01 701 (LPAR)
    Krnl PSW : 0704c00180000000 000003d93deb0122 (kvm_handle_sie_intercept+0x432/0x4d0 [kvm])
               R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
    Krnl GPRS: 000003d900000027 000003d900000023 0000000000000028 000002cd00000000
               000002d063a00900 00000359c6daf708 00000000000bebb5 0000000000001eff
               000002cfd82e9000 000002cfd80bc000 0000000000001011 000003d93deda412
               000003ff8962df98 000003d93de77ce0 000003d93deb011e 00000359c6daf960
    Krnl Code: 000003d93deb0112: c020fffe7259       larl    %r2,000003d93de7e5c4
               000003d93deb0118: c0e53fa8beac       brasl   %r14,000003d9bd3c7e70
              #000003d93deb011e: af000000           mc      0,0
              >000003d93deb0122: a728ffea           lhi     %r2,-22
               000003d93deb0126: a7f4fe24           brc     15,000003d93deafd6e
               000003d93deb012a: 9101f0b0           tm      176(%r15),1
               000003d93deb012e: a774fe48           brc     7,000003d93deafdbe
               000003d93deb0132: 40a0f0ae           sth     %r10,174(%r15)
    Call Trace:
     [<000003d93deb0122>] kvm_handle_sie_intercept+0x432/0x4d0 [kvm]
    ([<000003d93deb011e>] kvm_handle_sie_intercept+0x42e/0x4d0 [kvm])
     [<000003d93deacc10>] vcpu_post_run+0x1d0/0x3b0 [kvm]
     [<000003d93deaceda>] __vcpu_run+0xea/0x2d0 [kvm]
     [<000003d93dead9da>] kvm_arch_vcpu_ioctl_run+0x16a/0x430 [kvm]
     [<000003d93de93ee0>] kvm_vcpu_ioctl+0x190/0x7c0 [kvm]
     [<000003d9bd728b4e>] vfs_ioctl+0x2e/0x70
     [<000003d9bd72a092>] __s390x_sys_ioctl+0xc2/0xd0
     [<000003d9be0e9222>] __do_syscall+0x1f2/0x2e0
     [<000003d9be0f9a90>] system_call+0x70/0x98
    Last Breaking-Event-Address:
     [<000003d9bd3c7f58>] __warn_printk+0xe8/0xf0
    
    Cc: stable@vger.kernel.org
    Reported-by: Christian Borntraeger <borntraeger@linux.ibm.com>
    Fixes: fe0ef0030463 ("KVM: s390: sort out physical vs virtual pointers usage")
    Signed-off-by: Michael Mueller <mimu@linux.ibm.com>
    Tested-by: Christian Borntraeger <borntraeger@linux.ibm.com>
    Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
    Link: https://lore.kernel.org/r/20240801123109.2782155-1-mimu@linux.ibm.com
    Message-ID: <20240801123109.2782155-1-mimu@linux.ibm.com>
    Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

libfs: fix infinite directory reads for offset dir [+ + +]

Author: yangerkun <yangerkun@huawei.com>
Date:   Wed Jul 31 12:38:35 2024 +0800

    libfs: fix infinite directory reads for offset dir
    
    [ Upstream commit 64a7ce76fb901bf9f9c36cf5d681328fc0fd4b5a ]
    
    After we switch tmpfs dir operations from simple_dir_operations to
    simple_offset_dir_operations, every rename happened will fill new dentry
    to dest dir's maple tree(&SHMEM_I(inode)->dir_offsets->mt) with a free
    key starting with octx->newx_offset, and then set newx_offset equals to
    free key + 1. This will lead to infinite readdir combine with rename
    happened at the same time, which fail generic/736 in xfstests(detail show
    as below).
    
    1. create 5000 files(1 2 3...) under one dir
    2. call readdir(man 3 readdir) once, and get one entry
    3. rename(entry, "TEMPFILE"), then rename("TEMPFILE", entry)
    4. loop 2~3, until readdir return nothing or we loop too many
       times(tmpfs break test with the second condition)
    
    We choose the same logic what commit 9b378f6ad48cf ("btrfs: fix infinite
    directory reads") to fix it, record the last_index when we open dir, and
    do not emit the entry which index >= last_index. The file->private_data
    now used in offset dir can use directly to do this, and we also update
    the last_index when we llseek the dir file.
    
    Fixes: a2e459555c5f ("shmem: stable directory offsets")
    Signed-off-by: yangerkun <yangerkun@huawei.com>
    Link: https://lore.kernel.org/r/20240731043835.1828697-1-yangerkun@huawei.com
    Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
    [brauner: only update last_index after seek when offset is zero like Jan suggested]
    Signed-off-by: Christian Brauner <brauner@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

Linux: Linux 6.10.7 [+ + +]

Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date:   Thu Aug 29 17:36:13 2024 +0200

    Linux 6.10.7
    
    Link: https://lore.kernel.org/r/20240827143833.371588371@linuxfoundation.org
    Tested-by: Ronald Warsow <rwarsow@gmx.de>
    Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
    Tested-by: Salvatore Bonaccorso <carnil@debian.org>
    Tested-by: SeongJae Park <sj@kernel.org>
    Tested-by: Peter Schneider <pschneider1968@googlemail.com>
    Tested-by: Mark Brown <broonie@kernel.org>
    Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
    Tested-by: Miguel Ojeda <ojeda@kernel.org>
    Tested-by: Justin M. Forbes <jforbes@fedoraproject.org>
    Tested-by: Ron Economos <re@w6rz.net>
    Tested-by: Jon Hunter <jonathanh@nvidia.com>
    Tested-by: kernelci.org bot <bot@kernelci.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Makefile: add $(srctree) to dependency of compile_commands.json target [+ + +]

Author: Alexandre Courbot <gnurou@gmail.com>
Date:   Sun Aug 4 14:50:57 2024 +0900

    Makefile: add $(srctree) to dependency of compile_commands.json target
    
    [ Upstream commit 6fc9aacad49e3fbecd270c266850d50c453d52ef ]
    
    When trying to build compile_commands.json for an external module against
    the kernel built in a separate output directory, the following error is
    displayed:
    
      make[1]: *** No rule to make target 'scripts/clang-tools/gen_compile_commands.py',
      needed by 'compile_commands.json'. Stop.
    
    This is because gen_compile_commands.py was previously looked up using a
    relative path to $(srctree), but commit b1992c3772e6 ("kbuild: use
    $(src) instead of $(srctree)/$(src) for source directory") stopped
    defining VPATH for external module builds.
    
    Prefixing gen_compile_commands.py with $(srctree) fixes the problem.
    
    Fixes: b1992c3772e6 ("kbuild: use $(src) instead of $(srctree)/$(src) for source directory")
    Signed-off-by: Alexandre Courbot <gnurou@gmail.com>
    Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
    Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

md/raid1: Fix data corruption for degraded array with slow disk [+ + +]

Author: Yu Kuai <yukuai3@huawei.com>
Date:   Sat Aug 3 17:11:37 2024 +0800

    md/raid1: Fix data corruption for degraded array with slow disk
    
    commit c916ca35308d3187c9928664f9be249b22a3a701 upstream.
    
    read_balance() will avoid reading from slow disks as much as possible,
    however, if valid data only lands in slow disks, and a new normal disk
    is still in recovery, unrecovered data can be read:
    
    raid1_read_request
     read_balance
      raid1_should_read_first
      -> return false
      choose_best_rdev
      -> normal disk is not recovered, return -1
      choose_bb_rdev
      -> missing the checking of recovery, return the normal disk
     -> read unrecovered data
    
    Root cause is that the checking of recovery is missing in
    choose_bb_rdev(). Hence add such checking to fix the problem.
    
    Also fix similar problem in choose_slow_rdev().
    
    Cc: stable@vger.kernel.org
    Fixes: 9f3ced792203 ("md/raid1: factor out choose_bb_rdev() from read_balance()")
    Fixes: dfa8ecd167c1 ("md/raid1: factor out choose_slow_rdev() from read_balance()")
    Reported-and-tested-by: Mateusz Jończyk <mat.jonczyk@o2.pl>
    Closes: https://lore.kernel.org/all/9952f532-2554-44bf-b906-4880b2e88e3a@o2.pl/
    Signed-off-by: Yu Kuai <yukuai3@huawei.com>
    Link: https://lore.kernel.org/r/20240803091137.3197008-1-yukuai1@huaweicloud.com
    Signed-off-by: Song Liu <song@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

media: atomisp: Fix streaming no longer working on BYT / ISP2400 devices [+ + +]

Author: Hans de Goede <hdegoede@redhat.com>
Date:   Sun Jul 21 17:38:40 2024 +0200

    media: atomisp: Fix streaming no longer working on BYT / ISP2400 devices
    
    commit 63de936b513f7a9ce559194d3269ac291f4f4662 upstream.
    
    Commit a0821ca14bb8 ("media: atomisp: Remove test pattern generator (TPG)
    support") broke BYT support because it removed a seemingly unused field
    from struct sh_css_sp_config and a seemingly unused value from enum
    ia_css_input_mode.
    
    But these are part of the ABI between the kernel and firmware on ISP2400
    and this part of the TPG support removal changes broke ISP2400 support.
    
    ISP2401 support was not affected because on ISP2401 only a part of
    struct sh_css_sp_config is used.
    
    Restore the removed field and enum value to fix this.
    
    Fixes: a0821ca14bb8 ("media: atomisp: Remove test pattern generator (TPG) support")
    Cc: stable@vger.kernel.org
    Signed-off-by: Hans de Goede <hdegoede@redhat.com>
    Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

memcg_write_event_control(): fix a user-triggerable oops [+ + +]

Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Sun Jul 21 14:45:08 2024 -0400

    memcg_write_event_control(): fix a user-triggerable oops
    
    commit 046667c4d3196938e992fba0dfcde570aa85cd0e upstream.
    
    we are *not* guaranteed that anything past the terminating NUL
    is mapped (let alone initialized with anything sane).
    
    Fixes: 0dea116876ee ("cgroup: implement eventfd-based generic API for notifications")
    Cc: stable@vger.kernel.org
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

MIPS: Loongson64: Set timer mode in cpu-probe [+ + +]

Author: Jiaxun Yang <jiaxun.yang@flygoat.com>
Date:   Tue Jul 23 17:15:44 2024 +0800

    MIPS: Loongson64: Set timer mode in cpu-probe
    
    commit 1cb6ab446424649f03c82334634360c2e3043684 upstream.
    
    Loongson64 C and G processors have EXTIMER feature which
    is conflicting with CP0 counter.
    
    Although the processor resets in EXTIMER disabled & INTIMER
    enabled mode, which is compatible with MIPS CP0 compare, firmware
    may attempt to enable EXTIMER and interfere CP0 compare.
    
    Set timer mode back to MIPS compatible mode to fix booting on
    systems with such firmware before we have an actual driver for
    EXTIMER.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
    Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mlxbf_gige: disable RX filters until RX path initialized [+ + +]

Author: David Thompson <davthompson@nvidia.com>
Date:   Fri Aug 9 12:36:12 2024 -0400

    mlxbf_gige: disable RX filters until RX path initialized
    
    [ Upstream commit df934abb185c71c9f2fa07a5013672d0cbd36560 ]
    
    A recent change to the driver exposed a bug where the MAC RX
    filters (unicast MAC, broadcast MAC, and multicast MAC) are
    configured and enabled before the RX path is fully initialized.
    The result of this bug is that after the PHY is started packets
    that match these MAC RX filters start to flow into the RX FIFO.
    And then, after rx_init() is completed, these packets will go
    into the driver RX ring as well. If enough packets are received
    to fill the RX ring (default size is 128 packets) before the call
    to request_irq() completes, the driver RX function becomes stuck.
    
    This bug is intermittent but is most likely to be seen where the
    oob_net0 interface is connected to a busy network with lots of
    broadcast and multicast traffic.
    
    All the MAC RX filters must be disabled until the RX path is ready,
    i.e. all initialization is done and all the IRQs are installed.
    
    Fixes: f7442a634ac0 ("mlxbf_gige: call request_irq() after NAPI initialized")
    Reviewed-by: Asmaa Mnebhi <asmaa@nvidia.com>
    Signed-off-by: David Thompson <davthompson@nvidia.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Link: https://patch.msgid.link/20240809163612.12852-1-davthompson@nvidia.com
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

mm/hugetlb: fix hugetlb vs. core-mm PT locking [+ + +]

Author: David Hildenbrand <david@redhat.com>
Date:   Thu Aug 1 22:47:48 2024 +0200

    mm/hugetlb: fix hugetlb vs. core-mm PT locking
    
    commit 5f75cfbd6bb02295ddaed48adf667b6c828ce07b upstream.
    
    We recently made GUP's common page table walking code to also walk hugetlb
    VMAs without most hugetlb special-casing, preparing for the future of
    having less hugetlb-specific page table walking code in the codebase.
    Turns out that we missed one page table locking detail: page table locking
    for hugetlb folios that are not mapped using a single PMD/PUD.
    
    Assume we have hugetlb folio that spans multiple PTEs (e.g., 64 KiB
    hugetlb folios on arm64 with 4 KiB base page size).  GUP, as it walks the
    page tables, will perform a pte_offset_map_lock() to grab the PTE table
    lock.
    
    However, hugetlb that concurrently modifies these page tables would
    actually grab the mm->page_table_lock: with USE_SPLIT_PTE_PTLOCKS, the
    locks would differ.  Something similar can happen right now with hugetlb
    folios that span multiple PMDs when USE_SPLIT_PMD_PTLOCKS.
    
    This issue can be reproduced [1], for example triggering:
    
    [ 3105.936100] ------------[ cut here ]------------
    [ 3105.939323] WARNING: CPU: 31 PID: 2732 at mm/gup.c:142 try_grab_folio+0x11c/0x188
    [ 3105.944634] Modules linked in: [...]
    [ 3105.974841] CPU: 31 PID: 2732 Comm: reproducer Not tainted 6.10.0-64.eln141.aarch64 #1
    [ 3105.980406] Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20240524-4.fc40 05/24/2024
    [ 3105.986185] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
    [ 3105.991108] pc : try_grab_folio+0x11c/0x188
    [ 3105.994013] lr : follow_page_pte+0xd8/0x430
    [ 3105.996986] sp : ffff80008eafb8f0
    [ 3105.999346] x29: ffff80008eafb900 x28: ffffffe8d481f380 x27: 00f80001207cff43
    [ 3106.004414] x26: 0000000000000001 x25: 0000000000000000 x24: ffff80008eafba48
    [ 3106.009520] x23: 0000ffff9372f000 x22: ffff7a54459e2000 x21: ffff7a546c1aa978
    [ 3106.014529] x20: ffffffe8d481f3c0 x19: 0000000000610041 x18: 0000000000000001
    [ 3106.019506] x17: 0000000000000001 x16: ffffffffffffffff x15: 0000000000000000
    [ 3106.024494] x14: ffffb85477fdfe08 x13: 0000ffff9372ffff x12: 0000000000000000
    [ 3106.029469] x11: 1fffef4a88a96be1 x10: ffff7a54454b5f0c x9 : ffffb854771b12f0
    [ 3106.034324] x8 : 0008000000000000 x7 : ffff7a546c1aa980 x6 : 0008000000000080
    [ 3106.038902] x5 : 00000000001207cf x4 : 0000ffff9372f000 x3 : ffffffe8d481f000
    [ 3106.043420] x2 : 0000000000610041 x1 : 0000000000000001 x0 : 0000000000000000
    [ 3106.047957] Call trace:
    [ 3106.049522]  try_grab_folio+0x11c/0x188
    [ 3106.051996]  follow_pmd_mask.constprop.0.isra.0+0x150/0x2e0
    [ 3106.055527]  follow_page_mask+0x1a0/0x2b8
    [ 3106.058118]  __get_user_pages+0xf0/0x348
    [ 3106.060647]  faultin_page_range+0xb0/0x360
    [ 3106.063651]  do_madvise+0x340/0x598
    
    Let's make huge_pte_lockptr() effectively use the same PT locks as any
    core-mm page table walker would.  Add ptep_lockptr() to obtain the PTE
    page table lock using a pte pointer -- unfortunately we cannot convert
    pte_lockptr() because virt_to_page() doesn't work with kmap'ed page tables
    we can have with CONFIG_HIGHPTE.
    
    Handle CONFIG_PGTABLE_LEVELS correctly by checking in reverse order, such
    that when e.g., CONFIG_PGTABLE_LEVELS==2 with
    PGDIR_SIZE==P4D_SIZE==PUD_SIZE==PMD_SIZE will work as expected.  Document
    why that works.
    
    There is one ugly case: powerpc 8xx, whereby we have an 8 MiB hugetlb
    folio being mapped using two PTE page tables.  While hugetlb wants to take
    the PMD table lock, core-mm would grab the PTE table lock of one of both
    PTE page tables.  In such corner cases, we have to make sure that both
    locks match, which is (fortunately!) currently guaranteed for 8xx as it
    does not support SMP and consequently doesn't use split PT locks.
    
    [1] https://lore.kernel.org/all/1bbfcc7f-f222-45a5-ac44-c5a1381c596d@redhat.com/
    
    Link: https://lkml.kernel.org/r/20240801204748.99107-1-david@redhat.com
    Fixes: 9cb28da54643 ("mm/gup: handle hugetlb in the generic follow_page_mask code")
    Signed-off-by: David Hildenbrand <david@redhat.com>
    Acked-by: Peter Xu <peterx@redhat.com>
    Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
    Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
    Cc: Peter Xu <peterx@redhat.com>
    Cc: Oscar Salvador <osalvador@suse.de>
    Cc: Muchun Song <muchun.song@linux.dev>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mm/memory-failure: use raw_spinlock_t in struct memory_failure_cpu [+ + +]

Author: Waiman Long <longman@redhat.com>
Date:   Tue Aug 6 12:41:07 2024 -0400

    mm/memory-failure: use raw_spinlock_t in struct memory_failure_cpu
    
    commit d75abd0d0bc29e6ebfebbf76d11b4067b35844af upstream.
    
    The memory_failure_cpu structure is a per-cpu structure.  Access to its
    content requires the use of get_cpu_var() to lock in the current CPU and
    disable preemption.  The use of a regular spinlock_t for locking purpose
    is fine for a non-RT kernel.
    
    Since the integration of RT spinlock support into the v5.15 kernel, a
    spinlock_t in a RT kernel becomes a sleeping lock and taking a sleeping
    lock in a preemption disabled context is illegal resulting in the
    following kind of warning.
    
      [12135.732244] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
      [12135.732248] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 270076, name: kworker/0:0
      [12135.732252] preempt_count: 1, expected: 0
      [12135.732255] RCU nest depth: 2, expected: 2
        :
      [12135.732420] Hardware name: Dell Inc. PowerEdge R640/0HG0J8, BIOS 2.10.2 02/24/2021
      [12135.732423] Workqueue: kacpi_notify acpi_os_execute_deferred
      [12135.732433] Call Trace:
      [12135.732436]  <TASK>
      [12135.732450]  dump_stack_lvl+0x57/0x81
      [12135.732461]  __might_resched.cold+0xf4/0x12f
      [12135.732479]  rt_spin_lock+0x4c/0x100
      [12135.732491]  memory_failure_queue+0x40/0xe0
      [12135.732503]  ghes_do_memory_failure+0x53/0x390
      [12135.732516]  ghes_do_proc.constprop.0+0x229/0x3e0
      [12135.732575]  ghes_proc+0xf9/0x1a0
      [12135.732591]  ghes_notify_hed+0x6a/0x150
      [12135.732602]  notifier_call_chain+0x43/0xb0
      [12135.732626]  blocking_notifier_call_chain+0x43/0x60
      [12135.732637]  acpi_ev_notify_dispatch+0x47/0x70
      [12135.732648]  acpi_os_execute_deferred+0x13/0x20
      [12135.732654]  process_one_work+0x41f/0x500
      [12135.732695]  worker_thread+0x192/0x360
      [12135.732715]  kthread+0x111/0x140
      [12135.732733]  ret_from_fork+0x29/0x50
      [12135.732779]  </TASK>
    
    Fix it by using a raw_spinlock_t for locking instead.
    
    Also move the pr_err() out of the lock critical section and after
    put_cpu_ptr() to avoid indeterminate latency and the possibility of sleep
    with this call.
    
    [longman@redhat.com: don't hold percpu ref across pr_err(), per Miaohe]
      Link: https://lkml.kernel.org/r/20240807181130.1122660-1-longman@redhat.com
    Link: https://lkml.kernel.org/r/20240806164107.1044956-1-longman@redhat.com
    Fixes: 0f383b6dc96e ("locking/spinlock: Provide RT variant")
    Signed-off-by: Waiman Long <longman@redhat.com>
    Acked-by: Miaohe Lin <linmiaohe@huawei.com>
    Cc: "Huang, Ying" <ying.huang@intel.com>
    Cc: Juri Lelli <juri.lelli@redhat.com>
    Cc: Len Brown <len.brown@intel.com>
    Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mm/numa: no task_numa_fault() call if PMD is changed [+ + +]

Author: Zi Yan <ziy@nvidia.com>
Date:   Fri Aug 9 10:59:05 2024 -0400

    mm/numa: no task_numa_fault() call if PMD is changed
    
    commit fd8c35a92910f4829b7c99841f39b1b952c259d5 upstream.
    
    When handling a numa page fault, task_numa_fault() should be called by a
    process that restores the page table of the faulted folio to avoid
    duplicated stats counting.  Commit c5b5a3dd2c1f ("mm: thp: refactor NUMA
    fault handling") restructured do_huge_pmd_numa_page() and did not avoid
    task_numa_fault() call in the second page table check after a numa
    migration failure.  Fix it by making all !pmd_same() return immediately.
    
    This issue can cause task_numa_fault() being called more than necessary
    and lead to unexpected numa balancing results (It is hard to tell whether
    the issue will cause positive or negative performance impact due to
    duplicated numa fault counting).
    
    Link: https://lkml.kernel.org/r/20240809145906.1513458-3-ziy@nvidia.com
    Fixes: c5b5a3dd2c1f ("mm: thp: refactor NUMA fault handling")
    Reported-by: "Huang, Ying" <ying.huang@intel.com>
    Closes: https://lore.kernel.org/linux-mm/87zfqfw0yw.fsf@yhuang6-desk2.ccr.corp.intel.com/
    Signed-off-by: Zi Yan <ziy@nvidia.com>
    Acked-by: David Hildenbrand <david@redhat.com>
    Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
    Cc: "Huang, Ying" <ying.huang@intel.com>
    Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
    Cc: Mel Gorman <mgorman@suse.de>
    Cc: Yang Shi <shy828301@gmail.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mm/numa: no task_numa_fault() call if PTE is changed [+ + +]

Author: Zi Yan <ziy@nvidia.com>
Date:   Fri Aug 9 10:59:04 2024 -0400

    mm/numa: no task_numa_fault() call if PTE is changed
    
    commit 40b760cfd44566bca791c80e0720d70d75382b84 upstream.
    
    When handling a numa page fault, task_numa_fault() should be called by a
    process that restores the page table of the faulted folio to avoid
    duplicated stats counting.  Commit b99a342d4f11 ("NUMA balancing: reduce
    TLB flush via delaying mapping on hint page fault") restructured
    do_numa_page() and did not avoid task_numa_fault() call in the second page
    table check after a numa migration failure.  Fix it by making all
    !pte_same() return immediately.
    
    This issue can cause task_numa_fault() being called more than necessary
    and lead to unexpected numa balancing results (It is hard to tell whether
    the issue will cause positive or negative performance impact due to
    duplicated numa fault counting).
    
    Link: https://lkml.kernel.org/r/20240809145906.1513458-2-ziy@nvidia.com
    Fixes: b99a342d4f11 ("NUMA balancing: reduce TLB flush via delaying mapping on hint page fault")
    Signed-off-by: Zi Yan <ziy@nvidia.com>
    Reported-by: "Huang, Ying" <ying.huang@intel.com>
    Closes: https://lore.kernel.org/linux-mm/87zfqfw0yw.fsf@yhuang6-desk2.ccr.corp.intel.com/
    Acked-by: David Hildenbrand <david@redhat.com>
    Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
    Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
    Cc: Mel Gorman <mgorman@suse.de>
    Cc: Yang Shi <shy828301@gmail.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mm/vmalloc: fix page mapping if vm_area_alloc_pages() with high order fallback to order 0 [+ + +]

Author: Hailong Liu <hailong.liu@oppo.com>
Date:   Thu Aug 8 20:19:56 2024 +0800

    mm/vmalloc: fix page mapping if vm_area_alloc_pages() with high order fallback to order 0
    
    commit 61ebe5a747da649057c37be1c37eb934b4af79ca upstream.
    
    The __vmap_pages_range_noflush() assumes its argument pages** contains
    pages with the same page shift.  However, since commit e9c3cda4d86e ("mm,
    vmalloc: fix high order __GFP_NOFAIL allocations"), if gfp_flags includes
    __GFP_NOFAIL with high order in vm_area_alloc_pages() and page allocation
    failed for high order, the pages** may contain two different page shifts
    (high order and order-0).  This could lead __vmap_pages_range_noflush() to
    perform incorrect mappings, potentially resulting in memory corruption.
    
    Users might encounter this as follows (vmap_allow_huge = true, 2M is for
    PMD_SIZE):
    
    kvmalloc(2M, __GFP_NOFAIL|GFP_X)
        __vmalloc_node_range_noprof(vm_flags=VM_ALLOW_HUGE_VMAP)
            vm_area_alloc_pages(order=9) ---> order-9 allocation failed and fallback to order-0
                vmap_pages_range()
                    vmap_pages_range_noflush()
                        __vmap_pages_range_noflush(page_shift = 21) ----> wrong mapping happens
    
    We can remove the fallback code because if a high-order allocation fails,
    __vmalloc_node_range_noprof() will retry with order-0.  Therefore, it is
    unnecessary to fallback to order-0 here.  Therefore, fix this by removing
    the fallback code.
    
    Link: https://lkml.kernel.org/r/20240808122019.3361-1-hailong.liu@oppo.com
    Fixes: e9c3cda4d86e ("mm, vmalloc: fix high order __GFP_NOFAIL allocations")
    Signed-off-by: Hailong Liu <hailong.liu@oppo.com>
    Reported-by: Tangquan Zheng <zhengtangquan@oppo.com>
    Reviewed-by: Baoquan He <bhe@redhat.com>
    Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
    Acked-by: Barry Song <baohua@kernel.org>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mm: fix endless reclaim on machines with unaccepted memory [+ + +]

Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Date:   Fri Aug 9 14:48:47 2024 +0300

    mm: fix endless reclaim on machines with unaccepted memory
    
    commit 807174a93d24c456503692dc3f5af322ee0b640a upstream.
    
    Unaccepted memory is considered unusable free memory, which is not counted
    as free on the zone watermark check.  This causes get_page_from_freelist()
    to accept more memory to hit the high watermark, but it creates problems
    in the reclaim path.
    
    The reclaim path encounters a failed zone watermark check and attempts to
    reclaim memory.  This is usually successful, but if there is little or no
    reclaimable memory, it can result in endless reclaim with little to no
    progress.  This can occur early in the boot process, just after start of
    the init process when the only reclaimable memory is the page cache of the
    init executable and its libraries.
    
    Make unaccepted memory free from watermark check point of view.  This way
    unaccepted memory will never be the trigger of memory reclaim.  Accept
    more memory in the get_page_from_freelist() if needed.
    
    Link: https://lkml.kernel.org/r/20240809114854.3745464-2-kirill.shutemov@linux.intel.com
    Fixes: dcdfdd40fa82 ("mm: Add support for unaccepted memory")
    Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Reported-by: Jianxiong Gao <jxgao@google.com>
    Acked-by: David Hildenbrand <david@redhat.com>
    Tested-by: Jianxiong Gao <jxgao@google.com>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Mel Gorman <mgorman@suse.de>
    Cc: Mike Rapoport (Microsoft) <rppt@kernel.org>
    Cc: Tom Lendacky <thomas.lendacky@amd.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: <stable@vger.kernel.org>    [6.5+]
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mmc: dw_mmc: allow biu and ciu clocks to defer [+ + +]

Author: Ben Whitten <ben.whitten@gmail.com>
Date:   Sun Aug 11 22:22:11 2024 +0100

    mmc: dw_mmc: allow biu and ciu clocks to defer
    
    commit 6275c7bc8dd07644ea8142a1773d826800f0f3f7 upstream.
    
    Fix a race condition if the clock provider comes up after mmc is probed,
    this causes mmc to fail without retrying.
    When given the DEFER error from the clk source, pass it on up the chain.
    
    Fixes: f90a0612f0e1 ("mmc: dw_mmc: lookup for optional biu and ciu clocks")
    Signed-off-by: Ben Whitten <ben.whitten@gmail.com>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20240811212212.123255-1-ben.whitten@gmail.com
    Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mmc: mmc_test: Fix NULL dereference on allocation failure [+ + +]

Author: Dan Carpenter <dan.carpenter@linaro.org>
Date:   Tue Aug 20 11:44:08 2024 +0300

    mmc: mmc_test: Fix NULL dereference on allocation failure
    
    [ Upstream commit a1e627af32ed60713941cbfc8075d44cad07f6dd ]
    
    If the "test->highmem = alloc_pages()" allocation fails then calling
    __free_pages(test->highmem) will result in a NULL dereference.  Also
    change the error code to -ENOMEM instead of returning success.
    
    Fixes: 2661081f5ab9 ("mmc_test: highmem tests")
    Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
    Link: https://lore.kernel.org/r/8c90be28-67b4-4b0d-a105-034dc72a0b31@stanley.mountain
    Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

mmc: mtk-sd: receive cmd8 data when hs400 tuning fail [+ + +]

Author: Mengqi Zhang <mengqi.zhang@mediatek.com>
Date:   Tue Jul 16 09:37:04 2024 +0800

    mmc: mtk-sd: receive cmd8 data when hs400 tuning fail
    
    commit 9374ae912dbb1eed8139ed75fd2c0f1b30ca454d upstream.
    
    When we use cmd8 as the tuning command in hs400 mode, the command
    response sent back by some eMMC devices cannot be correctly sampled
    by MTK eMMC controller at some weak sample timing. In this case,
    command timeout error may occur. So we must receive the following
    data to make sure the next cmd8 send correctly.
    
    Signed-off-by: Mengqi Zhang <mengqi.zhang@mediatek.com>
    Fixes: c4ac38c6539b ("mmc: mtk-sd: Add HS400 online tuning support")
    Cc: stable@vger.stable.com
    Link: https://lore.kernel.org/r/20240716013704.10578-1-mengqi.zhang@mediatek.com
    Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mptcp: correct MPTCP_SUBFLOW_ATTR_SSN_OFFSET reserved size [+ + +]

Author: Eugene Syromiatnikov <esyr@redhat.com>
Date:   Mon Aug 12 08:51:23 2024 +0200

    mptcp: correct MPTCP_SUBFLOW_ATTR_SSN_OFFSET reserved size
    
    [ Upstream commit 655111b838cdabdb604f3625a9ff08c5eedb11da ]
    
    ssn_offset field is u32 and is placed into the netlink response with
    nla_put_u32(), but only 2 bytes are reserved for the attribute payload
    in subflow_get_info_size() (even though it makes no difference
    in the end, as it is aligned up to 4 bytes).  Supply the correct
    argument to the relevant nla_total_size() call to make it less
    confusing.
    
    Fixes: 5147dfb50832 ("mptcp: allow dumping subflow context to userspace")
    Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
    Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Link: https://patch.msgid.link/20240812065024.GA19719@asgard.redhat.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

mptcp: pm: avoid possible UaF when selecting endp [+ + +]

Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date:   Mon Aug 19 21:45:32 2024 +0200

    mptcp: pm: avoid possible UaF when selecting endp
    
    commit 48e50dcbcbaaf713d82bf2da5c16aeced94ad07d upstream.
    
    select_local_address() and select_signal_address() both select an
    endpoint entry from the list inside an RCU protected section, but return
    a reference to it, to be read later on. If the entry is dereferenced
    after the RCU unlock, reading info could cause a Use-after-Free.
    
    A simple solution is to copy the required info while inside the RCU
    protected section to avoid any risk of UaF later. The address ID might
    need to be modified later to handle the ID0 case later, so a copy seems
    OK to deal with.
    
    Reported-by: Paolo Abeni <pabeni@redhat.com>
    Closes: https://lore.kernel.org/45cd30d3-7710-491c-ae4d-a1368c00beb1@redhat.com
    Fixes: 01cacb00b35c ("mptcp: add netlink-based PM")
    Cc: stable@vger.kernel.org
    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Link: https://patch.msgid.link/20240819-net-mptcp-pm-reusing-id-v1-14-38035d40de5b@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mptcp: pm: check add_addr_accept_max before accepting new ADD_ADDR [+ + +]

Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date:   Mon Aug 19 21:45:28 2024 +0200

    mptcp: pm: check add_addr_accept_max before accepting new ADD_ADDR
    
    commit 0137a3c7c2ea3f9df8ebfc65d78b4ba712a187bb upstream.
    
    The limits might have changed in between, it is best to check them
    before accepting new ADD_ADDR.
    
    Fixes: d0876b2284cf ("mptcp: add the incoming RM_ADDR support")
    Cc: stable@vger.kernel.org
    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Link: https://patch.msgid.link/20240819-net-mptcp-pm-reusing-id-v1-10-38035d40de5b@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mptcp: pm: fullmesh: select the right ID later [+ + +]

Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date:   Mon Aug 19 21:45:30 2024 +0200

    mptcp: pm: fullmesh: select the right ID later
    
    commit 09355f7abb9fbfc1a240be029837921ea417bf4f upstream.
    
    When reacting upon the reception of an ADD_ADDR, the in-kernel PM first
    looks for fullmesh endpoints. If there are some, it will pick them,
    using their entry ID.
    
    It should set the ID 0 when using the endpoint corresponding to the
    initial subflow, it is a special case imposed by the MPTCP specs.
    
    Note that msk->mpc_endpoint_id might not be set when receiving the first
    ADD_ADDR from the server. So better to compare the addresses.
    
    Fixes: 1a0d6136c5f0 ("mptcp: local addresses fullmesh")
    Cc: stable@vger.kernel.org
    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Link: https://patch.msgid.link/20240819-net-mptcp-pm-reusing-id-v1-12-38035d40de5b@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mptcp: pm: only decrement add_addr_accepted for MPJ req [+ + +]

Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date:   Mon Aug 19 21:45:27 2024 +0200

    mptcp: pm: only decrement add_addr_accepted for MPJ req
    
    commit 1c1f721375989579e46741f59523e39ec9b2a9bd upstream.
    
    Adding the following warning ...
    
      WARN_ON_ONCE(msk->pm.add_addr_accepted == 0)
    
    ... before decrementing the add_addr_accepted counter helped to find a
    bug when running the "remove single subflow" subtest from the
    mptcp_join.sh selftest.
    
    Removing a 'subflow' endpoint will first trigger a RM_ADDR, then the
    subflow closure. Before this patch, and upon the reception of the
    RM_ADDR, the other peer will then try to decrement this
    add_addr_accepted. That's not correct because the attached subflows have
    not been created upon the reception of an ADD_ADDR.
    
    A way to solve that is to decrement the counter only if the attached
    subflow was an MP_JOIN to a remote id that was not 0, and initiated by
    the host receiving the RM_ADDR.
    
    Fixes: d0876b2284cf ("mptcp: add the incoming RM_ADDR support")
    Cc: stable@vger.kernel.org
    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Link: https://patch.msgid.link/20240819-net-mptcp-pm-reusing-id-v1-9-38035d40de5b@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mptcp: pm: only in-kernel cannot have entries with ID 0 [+ + +]

Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date:   Mon Aug 19 21:45:29 2024 +0200

    mptcp: pm: only in-kernel cannot have entries with ID 0
    
    commit ca6e55a703ca2894611bb5c5bca8bfd2290fd91e upstream.
    
    The ID 0 is specific per MPTCP connections. The per netns entries cannot
    have this special ID 0 then.
    
    But that's different for the userspace PM where the entries are per
    connection, they can then use this special ID 0.
    
    Fixes: f40be0db0b76 ("mptcp: unify pm get_flags_and_ifindex_by_id")
    Cc: stable@vger.kernel.org
    Acked-by: Geliang Tang <geliang@kernel.org>
    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Link: https://patch.msgid.link/20240819-net-mptcp-pm-reusing-id-v1-11-38035d40de5b@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mptcp: pm: only mark 'subflow' endp as available [+ + +]

Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date:   Mon Aug 19 21:45:26 2024 +0200

    mptcp: pm: only mark 'subflow' endp as available
    
    commit 322ea3778965da72862cca2a0c50253aacf65fe6 upstream.
    
    Adding the following warning ...
    
      WARN_ON_ONCE(msk->pm.local_addr_used == 0)
    
    ... before decrementing the local_addr_used counter helped to find a bug
    when running the "remove single address" subtest from the mptcp_join.sh
    selftests.
    
    Removing a 'signal' endpoint will trigger the removal of all subflows
    linked to this endpoint via mptcp_pm_nl_rm_addr_or_subflow() with
    rm_type == MPTCP_MIB_RMSUBFLOW. This will decrement the local_addr_used
    counter, which is wrong in this case because this counter is linked to
    'subflow' endpoints, and here it is a 'signal' endpoint that is being
    removed.
    
    Now, the counter is decremented, only if the ID is being used outside
    of mptcp_pm_nl_rm_addr_or_subflow(), only for 'subflow' endpoints, and
    if the ID is not 0 -- local_addr_used is not taking into account these
    ones. This marking of the ID as being available, and the decrement is
    done no matter if a subflow using this ID is currently available,
    because the subflow could have been closed before.
    
    Fixes: 06faa2271034 ("mptcp: remove multi addresses and subflows in PM")
    Cc: stable@vger.kernel.org
    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Link: https://patch.msgid.link/20240819-net-mptcp-pm-reusing-id-v1-8-38035d40de5b@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mptcp: pm: re-using ID of unused flushed subflows [+ + +]

Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date:   Mon Aug 19 21:45:23 2024 +0200

    mptcp: pm: re-using ID of unused flushed subflows
    
    commit ef34a6ea0cab1800f4b3c9c3c2cefd5091e03379 upstream.
    
    If no subflows are attached to the 'subflow' endpoints that are being
    flushed, the corresponding addr IDs will not be marked as available
    again.
    
    Mark all ID as being available when flushing all the 'subflow'
    endpoints, and reset local_addr_used counter to cover these cases.
    
    Note that mptcp_pm_remove_addrs_and_subflows() helper is only called for
    flushing operations, not to remove a specific set of addresses and
    subflows.
    
    Fixes: 06faa2271034 ("mptcp: remove multi addresses and subflows in PM")
    Cc: stable@vger.kernel.org
    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Link: https://patch.msgid.link/20240819-net-mptcp-pm-reusing-id-v1-5-38035d40de5b@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mptcp: pm: re-using ID of unused removed ADD_ADDR [+ + +]

Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date:   Mon Aug 19 21:45:19 2024 +0200

    mptcp: pm: re-using ID of unused removed ADD_ADDR
    
    commit e255683c06df572ead96db5efb5d21be30c0efaa upstream.
    
    If no subflow is attached to the 'signal' endpoint that is being
    removed, the addr ID will not be marked as available again.
    
    Mark the linked ID as available when removing the address entry from the
    list to cover this case.
    
    Fixes: b6c08380860b ("mptcp: remove addr and subflow in PM netlink")
    Cc: stable@vger.kernel.org
    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Link: https://patch.msgid.link/20240819-net-mptcp-pm-reusing-id-v1-1-38035d40de5b@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mptcp: pm: re-using ID of unused removed subflows [+ + +]

Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date:   Mon Aug 19 21:45:21 2024 +0200

    mptcp: pm: re-using ID of unused removed subflows
    
    commit edd8b5d868a4d459f3065493001e293901af758d upstream.
    
    If no subflow is attached to the 'subflow' endpoint that is being
    removed, the addr ID will not be marked as available again.
    
    Mark the linked ID as available when removing the 'subflow' endpoint if
    no subflow is attached to it.
    
    While at it, the local_addr_used counter is decremented if the ID was
    marked as being used to reflect the reality, but also to allow adding
    new endpoints after that.
    
    Fixes: b6c08380860b ("mptcp: remove addr and subflow in PM netlink")
    Cc: stable@vger.kernel.org
    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Link: https://patch.msgid.link/20240819-net-mptcp-pm-reusing-id-v1-3-38035d40de5b@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mptcp: pm: remove mptcp_pm_remove_subflow() [+ + +]

Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date:   Mon Aug 19 21:45:25 2024 +0200

    mptcp: pm: remove mptcp_pm_remove_subflow()
    
    commit f448451aa62d54be16acb0034223c17e0d12bc69 upstream.
    
    This helper is confusing. It is in pm.c, but it is specific to the
    in-kernel PM and it cannot be used by the userspace one. Also, it simply
    calls one in-kernel specific function with the PM lock, while the
    similar mptcp_pm_remove_addr() helper requires the PM lock.
    
    What's left is the pr_debug(), which is not that useful, because a
    similar one is present in the only function called by this helper:
    
      mptcp_pm_nl_rm_subflow_received()
    
    After these modifications, this helper can be marked as 'static', and
    the lock can be taken only once in mptcp_pm_flush_addrs_and_subflows().
    
    Note that it is not a bug fix, but it will help backporting the
    following commits.
    
    Fixes: 0ee4261a3681 ("mptcp: implement mptcp_pm_remove_subflow")
    Cc: stable@vger.kernel.org
    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Link: https://patch.msgid.link/20240819-net-mptcp-pm-reusing-id-v1-7-38035d40de5b@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mseal: fix is_madv_discard() [+ + +]

Author: Pedro Falcato <pedro.falcato@gmail.com>
Date:   Wed Aug 7 18:33:35 2024 +0100

    mseal: fix is_madv_discard()
    
    commit e46bc2e7eb90a370bc27fa2fd98cb8251e7da1ec upstream.
    
    is_madv_discard did its check wrong. MADV_ flags are not bitwise,
    they're normal sequential numbers. So, for instance:
            behavior & (/* ... */ | MADV_REMOVE)
    
    tagged both MADV_REMOVE and MADV_RANDOM (bit 0 set) as discard
    operations.
    
    As a result the kernel could erroneously block certain madvises (e.g
    MADV_RANDOM or MADV_HUGEPAGE) on sealed VMAs due to them sharing bits
    with blocked MADV operations (e.g REMOVE or WIPEONFORK).
    
    This is obviously incorrect, so use a switch statement instead.
    
    Link: https://lkml.kernel.org/r/20240807173336.2523757-1-pedro.falcato@gmail.com
    Link: https://lkml.kernel.org/r/20240807173336.2523757-2-pedro.falcato@gmail.com
    Fixes: 8be7258aad44 ("mseal: add mseal syscall")
    Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
    Tested-by: Jeff Xu <jeffxu@chromium.org>
    Reviewed-by: Jeff Xu <jeffxu@chromium.org>
    Cc: Kees Cook <kees@kernel.org>
    Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net/mlx5: Fix IPsec RoCE MPV trace call [+ + +]

Author: Patrisious Haddad <phaddad@nvidia.com>
Date:   Thu Aug 15 10:16:11 2024 +0300

    net/mlx5: Fix IPsec RoCE MPV trace call
    
    [ Upstream commit 607e1df7bd47fe91cab85a97f57870a26d066137 ]
    
    Prevent the call trace below from happening, by not allowing IPsec
    creation over a slave, if master device doesn't support IPsec.
    
    WARNING: CPU: 44 PID: 16136 at kernel/locking/rwsem.c:240 down_read+0x75/0x94
    Modules linked in: esp4_offload esp4 act_mirred act_vlan cls_flower sch_ingress mlx5_vdpa vringh vhost_iotlb vdpa mst_pciconf(OE) nfsv3 nfs_acl nfs lockd grace fscache netfs xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_counter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill cuse fuse rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_umad ib_iser libiscsi scsi_transport_iscsi rdma_cm ib_ipoib iw_cm ib_cm ipmi_ssif intel_rapl_msr intel_rapl_common amd64_edac edac_mce_amd kvm_amd kvm irqbypass crct10dif_pclmul crc32_pclmul mlx5_ib ghash_clmulni_intel sha1_ssse3 dell_smbios ib_uverbs aesni_intel crypto_simd dcdbas wmi_bmof dell_wmi_descriptor cryptd pcspkr ib_core acpi_ipmi sp5100_tco ccp i2c_piix4 ipmi_si ptdma k10temp ipmi_devintf ipmi_msghandler acpi_power_meter acpi_cpufreq ext4 mbcache jbd2 sd_mod t10_pi sg mgag200 drm_kms_helper syscopyarea sysfillrect mlx5_core sysimgblt fb_sys_fops cec
     ahci libahci mlxfw drm pci_hyperv_intf libata tg3 sha256_ssse3 tls megaraid_sas i2c_algo_bit psample wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: mst_pci]
    CPU: 44 PID: 16136 Comm: kworker/44:3 Kdump: loaded Tainted: GOE 5.15.0-20240509.el8uek.uek7_u3_update_v6.6_ipsec_bf.x86_64 #2
    Hardware name: Dell Inc. PowerEdge R7525/074H08, BIOS 2.0.3 01/15/2021
    Workqueue: events xfrm_state_gc_task
    RIP: 0010:down_read+0x75/0x94
    Code: 00 48 8b 45 08 65 48 8b 14 25 80 fc 01 00 83 e0 02 48 09 d0 48 83 c8 01 48 89 45 08 5d 31 c0 89 c2 89 c6 89 c7 e9 cb 88 3b 00 <0f> 0b 48 8b 45 08 a8 01 74 b2 a8 02 75 ae 48 89 c2 48 83 ca 02 f0
    RSP: 0018:ffffb26387773da8 EFLAGS: 00010282
    RAX: 0000000000000000 RBX: ffffa08b658af900 RCX: 0000000000000001
    RDX: 0000000000000000 RSI: ff886bc5e1366f2f RDI: 0000000000000000
    RBP: ffffa08b658af940 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000000 R12: ffffa0a9bfb31540
    R13: ffffa0a9bfb37900 R14: 0000000000000000 R15: ffffa0a9bfb37905
    FS:  0000000000000000(0000) GS:ffffa0a9bfb00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 000055a45ed814e8 CR3: 000000109038a000 CR4: 0000000000350ee0
    Call Trace:
     <TASK>
     ? show_trace_log_lvl+0x1d6/0x2f9
     ? show_trace_log_lvl+0x1d6/0x2f9
     ? mlx5_devcom_for_each_peer_begin+0x29/0x60 [mlx5_core]
     ? down_read+0x75/0x94
     ? __warn+0x80/0x113
     ? down_read+0x75/0x94
     ? report_bug+0xa4/0x11d
     ? handle_bug+0x35/0x8b
     ? exc_invalid_op+0x14/0x75
     ? asm_exc_invalid_op+0x16/0x1b
     ? down_read+0x75/0x94
     ? down_read+0xe/0x94
     mlx5_devcom_for_each_peer_begin+0x29/0x60 [mlx5_core]
     mlx5_ipsec_fs_roce_tx_destroy+0xb1/0x130 [mlx5_core]
     tx_destroy+0x1b/0xc0 [mlx5_core]
     tx_ft_put+0x53/0xc0 [mlx5_core]
     mlx5e_xfrm_free_state+0x45/0x90 [mlx5_core]
     ___xfrm_state_destroy+0x10f/0x1a2
     xfrm_state_gc_task+0x81/0xa9
     process_one_work+0x1f1/0x3c6
     worker_thread+0x53/0x3e4
     ? process_one_work.cold+0x46/0x3c
     kthread+0x127/0x144
     ? set_kthread_struct+0x60/0x52
     ret_from_fork+0x22/0x2d
     </TASK>
    ---[ end trace 5ef7896144d398e1 ]---
    
    Fixes: dfbd229abeee ("net/mlx5: Configure IPsec steering for egress RoCEv2 MPV traffic")
    Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
    Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
    Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
    Link: https://patch.msgid.link/20240815071611.2211873-5-tariqt@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net/mlx5: SD, Do not query MPIR register if no sd_group [+ + +]

Author: Tariq Toukan <tariqt@nvidia.com>
Date:   Thu Aug 8 17:41:02 2024 +0300

    net/mlx5: SD, Do not query MPIR register if no sd_group
    
    [ Upstream commit c31fe2b5095d8c84562ce90db07600f7e9f318df ]
    
    Unconditionally calling the MPIR query on BF separate mode yields the FW
    syndrome below [1]. Do not call it unless admin clearly specified the SD
    group, i.e. expressing the intention of using the multi-PF netdev
    feature.
    
    This fix covers cases not covered in
    commit fca3b4791850 ("net/mlx5: Do not query MPIR on embedded CPU function").
    
    [1]
    mlx5_cmd_out_err:808:(pid 8267): ACCESS_REG(0x805) op_mod(0x1) failed,
    status bad system state(0x4), syndrome (0x685f19), err(-5)
    
    Fixes: 678eb448055a ("net/mlx5: SD, Implement basic query and instantiation")
    Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
    Reviewed-by: Gal Pressman <gal@nvidia.com>
    Link: https://patch.msgid.link/20240808144107.2095424-2-tariqt@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net/mlx5e: Correctly report errors for ethtool rx flows [+ + +]

Author: Cosmin Ratiu <cratiu@nvidia.com>
Date:   Thu Aug 8 17:41:05 2024 +0300

    net/mlx5e: Correctly report errors for ethtool rx flows
    
    [ Upstream commit cbc796be1779c4dbc9a482c7233995e2a8b6bfb3 ]
    
    Previously, an ethtool rx flow with no attrs would not be added to the
    NIC as it has no rules to configure the hw with, but it would be
    reported as successful to the caller (return code 0). This is confusing
    for the user as ethtool then reports "Added rule $num", but no rule was
    actually added.
    
    This change corrects that by instead reporting these wrong rules as
    -EINVAL.
    
    Fixes: b29c61dac3a2 ("net/mlx5e: Ethtool steering flow validation refactoring")
    Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
    Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
    Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
    Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
    Link: https://patch.msgid.link/20240808144107.2095424-5-tariqt@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net/mlx5e: Take state lock during tx timeout reporter [+ + +]

Author: Dragos Tatulea <dtatulea@nvidia.com>
Date:   Thu Aug 8 17:41:04 2024 +0300

    net/mlx5e: Take state lock during tx timeout reporter
    
    [ Upstream commit e6b5afd30b99b43682a7764e1a74a42fe4d5f4b3 ]
    
    mlx5e_safe_reopen_channels() requires the state lock taken. The
    referenced changed in the Fixes tag removed the lock to fix another
    issue. This patch adds it back but at a later point (when calling
    mlx5e_safe_reopen_channels()) to avoid the deadlock referenced in the
    Fixes tag.
    
    Fixes: eab0da38912e ("net/mlx5e: Fix possible deadlock on mlx5e_tx_timeout_work")
    Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
    Link: https://lore.kernel.org/all/ZplpKq8FKi3vwfxv@gmail.com/T/
    Reviewed-by: Breno Leitao <leitao@debian.org>
    Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
    Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
    Link: https://patch.msgid.link/20240808144107.2095424-4-tariqt@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net/mlx5e: XPS, Fix oversight of Multi-PF Netdev changes [+ + +]

Author: Carolina Jubran <cjubran@nvidia.com>
Date:   Thu Aug 15 10:16:10 2024 +0300

    net/mlx5e: XPS, Fix oversight of Multi-PF Netdev changes
    
    [ Upstream commit a07e953dafe5ebd88942dc861dfb06eaf055fb07 ]
    
    The offending commit overlooked the Multi-PF Netdev changes.
    
    Revert mlx5e_set_default_xps_cpumasks to incorporate Multi-PF Netdev
    changes.
    
    Fixes: bcee093751f8 ("net/mlx5e: Modifying channels number and updating TX queues")
    Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
    Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
    Link: https://patch.msgid.link/20240815071611.2211873-4-tariqt@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: axienet: Fix register defines comment description [+ + +]

Author: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Date:   Fri Aug 9 11:56:09 2024 +0530

    net: axienet: Fix register defines comment description
    
    [ Upstream commit 9ff2f816e2aa65ca9a1cdf0954842f8173c0f48d ]
    
    In axiethernet header fix register defines comment description to be
    inline with IP documentation. It updates MAC configuration register,
    MDIO configuration register and frame filter control description.
    
    Fixes: 8a3b7a252dca ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver")
    Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
    Reviewed-by: Andrew Lunn <andrew@lunn.ch>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: dsa: microchip: fix PTP config failure when using multiple ports [+ + +]

Author: Martin Whitaker <foss@martin-whitaker.me.uk>
Date:   Sat Aug 17 10:41:41 2024 +0100

    net: dsa: microchip: fix PTP config failure when using multiple ports
    
    commit 6efea5135417ae8194485d1d05ea79a21cf1a11c upstream.
    
    When performing the port_hwtstamp_set operation, ptp_schedule_worker()
    will be called if hardware timestamoing is enabled on any of the ports.
    When using multiple ports for PTP, port_hwtstamp_set is executed for
    each port. When called for the first time ptp_schedule_worker() returns
    0. On subsequent calls it returns 1, indicating the worker is already
    scheduled. Currently the ksz driver treats 1 as an error and fails to
    complete the port_hwtstamp_set operation, thus leaving the timestamping
    configuration for those ports unchanged.
    
    This patch fixes this by ignoring the ptp_schedule_worker() return
    value.
    
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/7aae307a-35ca-4209-a850-7b2749d40f90@martin-whitaker.me.uk
    Fixes: bb01ad30570b0 ("net: dsa: microchip: ptp: manipulating absolute time using ptp hw clock")
    Signed-off-by: Martin Whitaker <foss@martin-whitaker.me.uk>
    Reviewed-by: Andrew Lunn <andrew@lunn.ch>
    Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
    Link: https://patch.msgid.link/20240817094141.3332-1-foss@martin-whitaker.me.uk
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net: dsa: mv88e6xxx: Fix out-of-bound access [+ + +]

Author: Joseph Huang <Joseph.Huang@garmin.com>
Date:   Mon Aug 19 19:52:50 2024 -0400

    net: dsa: mv88e6xxx: Fix out-of-bound access
    
    [ Upstream commit 528876d867a23b5198022baf2e388052ca67c952 ]
    
    If an ATU violation was caused by a CPU Load operation, the SPID could
    be larger than DSA_MAX_PORTS (the size of mv88e6xxx_chip.ports[] array).
    
    Fixes: 75c05a74e745 ("net: dsa: mv88e6xxx: Fix counting of ATU violations")
    Signed-off-by: Joseph Huang <Joseph.Huang@garmin.com>
    Reviewed-by: Andrew Lunn <andrew@lunn.ch>
    Link: https://patch.msgid.link/20240819235251.1331763-1-Joseph.Huang@garmin.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: dsa: vsc73xx: check busy flag in MDIO operations [+ + +]

Author: Pawel Dembicki <paweldembicki@gmail.com>
Date:   Fri Aug 9 21:38:04 2024 +0200

    net: dsa: vsc73xx: check busy flag in MDIO operations
    
    [ Upstream commit fa63c6434b6f6aaf9d8d599dc899bc0a074cc0ad ]
    
    The VSC73xx has a busy flag used during MDIO operations. It is raised
    when MDIO read/write operations are in progress. Without it, PHYs are
    misconfigured and bus operations do not work as expected.
    
    Fixes: 05bd97fc559d ("net: dsa: Add Vitesse VSC73xx DSA router driver")
    Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
    Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
    Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: dsa: vsc73xx: fix port MAC configuration in full duplex mode [+ + +]

Author: Pawel Dembicki <paweldembicki@gmail.com>
Date:   Fri Aug 9 21:38:02 2024 +0200

    net: dsa: vsc73xx: fix port MAC configuration in full duplex mode
    
    [ Upstream commit 63796bc2e97cd5ebcef60bad4953259d4ad11cb4 ]
    
    According to the datasheet description ("Port Mode Procedure" in 5.6.2),
    the VSC73XX_MAC_CFG_WEXC_DIS bit is configured only for half duplex mode.
    
    The WEXC_DIS bit is responsible for MAC behavior after an excessive
    collision. Let's set it as described in the datasheet.
    
    Fixes: 05bd97fc559d ("net: dsa: Add Vitesse VSC73xx DSA router driver")
    Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
    Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
    Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: dsa: vsc73xx: pass value in phy_write operation [+ + +]

Author: Pawel Dembicki <paweldembicki@gmail.com>
Date:   Fri Aug 9 21:38:03 2024 +0200

    net: dsa: vsc73xx: pass value in phy_write operation
    
    [ Upstream commit 5b9eebc2c7a5f0cc7950d918c1e8a4ad4bed5010 ]
    
    In the 'vsc73xx_phy_write' function, the register value is missing,
    and the phy write operation always sends zeros.
    
    This commit passes the value variable into the proper register.
    
    Fixes: 05bd97fc559d ("net: dsa: Add Vitesse VSC73xx DSA router driver")
    Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
    Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
    Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: ethernet: mtk_wed: fix use-after-free panic in mtk_wed_setup_tc_block_cb() [+ + +]

Author: Zheng Zhang <everything411@qq.com>
Date:   Sat Aug 10 13:26:51 2024 +0800

    net: ethernet: mtk_wed: fix use-after-free panic in mtk_wed_setup_tc_block_cb()
    
    [ Upstream commit db1b4bedb9b97c6d34b03d03815147c04fffe8b4 ]
    
    When there are multiple ap interfaces on one band and with WED on,
    turning the interface down will cause a kernel panic on MT798X.
    
    Previously, cb_priv was freed in mtk_wed_setup_tc_block() without
    marking NULL,and mtk_wed_setup_tc_block_cb() didn't check the value, too.
    
    Assign NULL after free cb_priv in mtk_wed_setup_tc_block() and check NULL
    in mtk_wed_setup_tc_block_cb().
    
    ----------
    Unable to handle kernel paging request at virtual address 0072460bca32b4f5
    Call trace:
     mtk_wed_setup_tc_block_cb+0x4/0x38
     0xffffffc0794084bc
     tcf_block_playback_offloads+0x70/0x1e8
     tcf_block_unbind+0x6c/0xc8
    ...
    ---------
    
    Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
    Signed-off-by: Zheng Zhang <everything411@qq.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: hns3: fix a deadlock problem when config TC during resetting [+ + +]

Author: Jie Wang <wangjie125@huawei.com>
Date:   Tue Aug 13 22:10:22 2024 +0800

    net: hns3: fix a deadlock problem when config TC during resetting
    
    [ Upstream commit be5e816d00a506719e9dbb1a9c861c5ced30a109 ]
    
    When config TC during the reset process, may cause a deadlock, the flow is
    as below:
                                 pf reset start
                                     │
                                     ▼
                                  ......
    setup tc                         │
        │                            ▼
        ▼                      DOWN: napi_disable()
    napi_disable()(skip)             │
        │                            │
        ▼                            ▼
      ......                      ......
        │                            │
        ▼                            │
    napi_enable()                    │
                                     ▼
                               UINIT: netif_napi_del()
                                     │
                                     ▼
                                  ......
                                     │
                                     ▼
                               INIT: netif_napi_add()
                                     │
                                     ▼
                                  ......                 global reset start
                                     │                      │
                                     ▼                      ▼
                               UP: napi_enable()(skip)    ......
                                     │                      │
                                     ▼                      ▼
                                  ......                 napi_disable()
    
    In reset process, the driver will DOWN the port and then UINIT, in this
    case, the setup tc process will UP the port before UINIT, so cause the
    problem. Adds a DOWN process in UINIT to fix it.
    
    Fixes: bb6b94a896d4 ("net: hns3: Add reset interface implementation in client")
    Signed-off-by: Jie Wang <wangjie125@huawei.com>
    Signed-off-by: Jijie Shao <shaojijie@huawei.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: hns3: fix wrong use of semaphore up [+ + +]

Author: Jie Wang <wangjie125@huawei.com>
Date:   Tue Aug 13 22:10:20 2024 +0800

    net: hns3: fix wrong use of semaphore up
    
    [ Upstream commit 8445d9d3c03101859663d34fda747f6a50947556 ]
    
    Currently, if hns3 PF or VF FLR reset failed after five times retry,
    the reset done process will directly release the semaphore
    which has already released in hclge_reset_prepare_general.
    This will cause down operation fail.
    
    So this patch fixes it by adding reset state judgement. The up operation is
    only called after successful PF FLR reset.
    
    Fixes: 8627bdedc435 ("net: hns3: refactor the precedure of PF FLR")
    Fixes: f28368bb4542 ("net: hns3: refactor the procedure of VF FLR")
    Signed-off-by: Jie Wang <wangjie125@huawei.com>
    Signed-off-by: Jijie Shao <shaojijie@huawei.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: hns3: use the user's cfg after reset [+ + +]

Author: Peiyang Wang <wangpeiyang1@huawei.com>
Date:   Tue Aug 13 22:10:21 2024 +0800

    net: hns3: use the user's cfg after reset
    
    [ Upstream commit 30545e17eac1f50c5ef49644daf6af205100a965 ]
    
    Consider the followed case that the user change speed and reset the net
    interface. Before the hw change speed successfully, the driver get old
    old speed from hw by timer task. After reset, the previous speed is config
    to hw. As a result, the new speed is configed successfully but lost after
    PF reset. The followed pictured shows more dirrectly.
    
    +------+              +----+                 +----+
    | USER |              | PF |                 | HW |
    +---+--+              +-+--+                 +-+--+
        |  ethtool -s 100G  |                      |
        +------------------>|   set speed 100G     |
        |                   +--------------------->|
        |                   |  set successfully    |
        |                   |<---------------------+---+
        |                   |query cfg (timer task)|   |
        |                   +--------------------->|   | handle speed
        |                   |     return 200G      |   | changing event
        |  ethtool --reset  |<---------------------+   | (100G)
        +------------------>|  cfg previous speed  |<--+
        |                   |  after reset (200G)  |
        |                   +--------------------->|
        |                   |                      +---+
        |                   |query cfg (timer task)|   |
        |                   +--------------------->|   | handle speed
        |                   |     return 100G      |   | changing event
        |                   |<---------------------+   | (200G)
        |                   |                      |<--+
        |                   |query cfg (timer task)|
        |                   +--------------------->|
        |                   |     return 200G      |
        |                   |<---------------------+
        |                   |                      |
        v                   v                      v
    
    This patch save new speed if hw change speed successfully, which will be
    used after reset successfully.
    
    Fixes: 2d03eacc0b7e ("net: hns3: Only update mac configuation when necessary")
    Signed-off-by: Peiyang Wang <wangpeiyang1@huawei.com>
    Signed-off-by: Jijie Shao <shaojijie@huawei.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: mana: Fix doorbell out of order violation and avoid unnecessary doorbell rings [+ + +]

Author: Long Li <longli@microsoft.com>
Date:   Fri Aug 9 08:58:58 2024 -0700

    net: mana: Fix doorbell out of order violation and avoid unnecessary doorbell rings
    
    commit 58a63729c957621f1990c3494c702711188ca347 upstream.
    
    After napi_complete_done() is called when NAPI is polling in the current
    process context, another NAPI may be scheduled and start running in
    softirq on another CPU and may ring the doorbell before the current CPU
    does. When combined with unnecessary rings when there is no need to arm
    the CQ, it triggers error paths in the hardware.
    
    This patch fixes this by calling napi_complete_done() after doorbell
    rings. It limits the number of unnecessary rings when there is
    no need to arm. MANA hardware specifies that there must be one doorbell
    ring every 8 CQ wraparounds. This driver guarantees one doorbell ring as
    soon as the number of consumed CQEs exceeds 4 CQ wraparounds. In practical
    workloads, the 4 CQ wraparounds proves to be big enough that it rarely
    exceeds this limit before all the napi weight is consumed.
    
    To implement this, add a per-CQ counter cq->work_done_since_doorbell,
    and make sure the CQ is armed as soon as passing 4 wraparounds of the CQ.
    
    Cc: stable@vger.kernel.org
    Fixes: e1b5683ff62e ("net: mana: Move NAPI from EQ to CQ")
    Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
    Signed-off-by: Long Li <longli@microsoft.com>
    Link: https://patch.msgid.link/1723219138-29887-1-git-send-email-longli@linuxonhyperv.com
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net: mana: Fix RX buf alloc_size alignment and atomic op panic [+ + +]

Author: Haiyang Zhang <haiyangz@microsoft.com>
Date:   Fri Aug 9 14:01:24 2024 -0700

    net: mana: Fix RX buf alloc_size alignment and atomic op panic
    
    commit 32316f676b4ee87c0404d333d248ccf777f739bc upstream.
    
    The MANA driver's RX buffer alloc_size is passed into napi_build_skb() to
    create SKB. skb_shinfo(skb) is located at the end of skb, and its alignment
    is affected by the alloc_size passed into napi_build_skb(). The size needs
    to be aligned properly for better performance and atomic operations.
    Otherwise, on ARM64 CPU, for certain MTU settings like 4000, atomic
    operations may panic on the skb_shinfo(skb)->dataref due to alignment fault.
    
    To fix this bug, add proper alignment to the alloc_size calculation.
    
    Sample panic info:
    [  253.298819] Unable to handle kernel paging request at virtual address ffff000129ba5cce
    [  253.300900] Mem abort info:
    [  253.301760]   ESR = 0x0000000096000021
    [  253.302825]   EC = 0x25: DABT (current EL), IL = 32 bits
    [  253.304268]   SET = 0, FnV = 0
    [  253.305172]   EA = 0, S1PTW = 0
    [  253.306103]   FSC = 0x21: alignment fault
    Call trace:
     __skb_clone+0xfc/0x198
     skb_clone+0x78/0xe0
     raw6_local_deliver+0xfc/0x228
     ip6_protocol_deliver_rcu+0x80/0x500
     ip6_input_finish+0x48/0x80
     ip6_input+0x48/0xc0
     ip6_sublist_rcv_finish+0x50/0x78
     ip6_sublist_rcv+0x1cc/0x2b8
     ipv6_list_rcv+0x100/0x150
     __netif_receive_skb_list_core+0x180/0x220
     netif_receive_skb_list_internal+0x198/0x2a8
     __napi_poll+0x138/0x250
     net_rx_action+0x148/0x330
     handle_softirqs+0x12c/0x3a0
    
    Cc: stable@vger.kernel.org
    Fixes: 80f6215b450e ("net: mana: Add support for jumbo frame")
    Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
    Reviewed-by: Long Li <longli@microsoft.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net: mctp: test: Use correct skb for route input check [+ + +]

Author: Jeremy Kerr <jk@codeconstruct.com.au>
Date:   Fri Aug 16 18:29:17 2024 +0800

    net: mctp: test: Use correct skb for route input check
    
    [ Upstream commit ce335db0621648472f9bb4b7191eb2e13a5793cf ]
    
    In the MCTP route input test, we're routing one skb, then (when delivery
    is expected) checking the resulting routed skb.
    
    However, we're currently checking the original skb length, rather than
    the routed skb. Check the routed skb instead; the original will have
    been freed at this point.
    
    Fixes: 8892c0490779 ("mctp: Add route input to socket tests")
    Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
    Closes: https://lore.kernel.org/kernel-janitors/4ad204f0-94cf-46c5-bdab-49592addf315@kili.mountain/
    Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Link: https://patch.msgid.link/20240816-mctp-kunit-skb-fix-v1-1-3c367ac89c27@codeconstruct.com.au
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: mscc: ocelot: fix QoS class for injected packets with "ocelot-8021q" [+ + +]

Author: Vladimir Oltean <vladimir.oltean@nxp.com>
Date:   Thu Aug 15 03:07:03 2024 +0300

    net: mscc: ocelot: fix QoS class for injected packets with "ocelot-8021q"
    
    [ Upstream commit e1b9e80236c540fa85d76e2d510d1b38e1968c5d ]
    
    There are 2 distinct code paths (listed below) in the source code which
    set up an injection header for Ocelot(-like) switches. Code path (2)
    lacks the QoS class and source port being set correctly. Especially the
    improper QoS classification is a problem for the "ocelot-8021q"
    alternative DSA tagging protocol, because we support tc-taprio and each
    packet needs to be scheduled precisely through its time slot. This
    includes PTP, which is normally assigned to a traffic class other than
    0, but would be sent through TC 0 nonetheless.
    
    The code paths are:
    
    (1) ocelot_xmit_common() from net/dsa/tag_ocelot.c - called only by the
        standard "ocelot" DSA tagging protocol which uses NPI-based
        injection - sets up bit fields in the tag manually to account for
        a small difference (destination port offset) between Ocelot and
        Seville. Namely, ocelot_ifh_set_dest() is omitted out of
        ocelot_xmit_common(), because there's also seville_ifh_set_dest().
    
    (2) ocelot_ifh_set_basic(), called by:
        - ocelot_fdma_prepare_skb() for FDMA transmission of the ocelot
          switchdev driver
        - ocelot_port_xmit() -> ocelot_port_inject_frame() for
          register-based transmission of the ocelot switchdev driver
        - felix_port_deferred_xmit() -> ocelot_port_inject_frame() for the
          DSA tagger ocelot-8021q when it must transmit PTP frames (also
          through register-based injection).
        sets the bit fields according to its own logic.
    
    The problem is that (2) doesn't call ocelot_ifh_set_qos_class().
    Copying that logic from ocelot_xmit_common() fixes that.
    
    Unfortunately, although desirable, it is not easily possible to
    de-duplicate code paths (1) and (2), and make net/dsa/tag_ocelot.c
    directly call ocelot_ifh_set_basic()), because of the ocelot/seville
    difference. This is the "minimal" fix with some logic duplicated (but
    at least more consolidated).
    
    Fixes: 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping")
    Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: mscc: ocelot: serialize access to the injection/extraction groups [+ + +]

Author: Vladimir Oltean <vladimir.oltean@nxp.com>
Date:   Thu Aug 15 03:07:04 2024 +0300

    net: mscc: ocelot: serialize access to the injection/extraction groups
    
    [ Upstream commit c5e12ac3beb0dd3a718296b2d8af5528e9ab728e ]
    
    As explained by Horatiu Vultur in commit 603ead96582d ("net: sparx5: Add
    spinlock for frame transmission from CPU") which is for a similar
    hardware design, multiple CPUs can simultaneously perform injection
    or extraction. There are only 2 register groups for injection and 2
    for extraction, and the driver only uses one of each. So we'd better
    serialize access using spin locks, otherwise frame corruption is
    possible.
    
    Note that unlike in sparx5, FDMA in ocelot does not have this issue
    because struct ocelot_fdma_tx_ring already contains an xmit_lock.
    
    I guess this is mostly a problem for NXP LS1028A, as that is dual core.
    I don't think VSC7514 is. So I'm blaming the commit where LS1028A (aka
    the felix DSA driver) started using register-based packet injection and
    extraction.
    
    Fixes: 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping")
    Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: mscc: ocelot: use ocelot_xmit_get_vlan_info() also for FDMA and register injection [+ + +]

Author: Vladimir Oltean <vladimir.oltean@nxp.com>
Date:   Thu Aug 15 03:07:02 2024 +0300

    net: mscc: ocelot: use ocelot_xmit_get_vlan_info() also for FDMA and register injection
    
    [ Upstream commit 67c3ca2c5cfe6a50772514e3349b5e7b3b0fac03 ]
    
    Problem description
    -------------------
    
    On an NXP LS1028A (felix DSA driver) with the following configuration:
    
    - ocelot-8021q tagging protocol
    - VLAN-aware bridge (with STP) spanning at least swp0 and swp1
    - 8021q VLAN upper interfaces on swp0 and swp1: swp0.700, swp1.700
    - ptp4l on swp0.700 and swp1.700
    
    we see that the ptp4l instances do not see each other's traffic,
    and they all go to the grand master state due to the
    ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES condition.
    
    Jumping to the conclusion for the impatient
    -------------------------------------------
    
    There is a zero-day bug in the ocelot switchdev driver in the way it
    handles VLAN-tagged packet injection. The correct logic already exists in
    the source code, in function ocelot_xmit_get_vlan_info() added by commit
    5ca721c54d86 ("net: dsa: tag_ocelot: set the classified VLAN during xmit").
    But it is used only for normal NPI-based injection with the DSA "ocelot"
    tagging protocol. The other injection code paths (register-based and
    FDMA-based) roll their own wrong logic. This affects and was noticed on
    the DSA "ocelot-8021q" protocol because it uses register-based injection.
    
    By moving ocelot_xmit_get_vlan_info() to a place that's common for both
    the DSA tagger and the ocelot switch library, it can also be called from
    ocelot_port_inject_frame() in ocelot.c.
    
    We need to touch the lines with ocelot_ifh_port_set()'s prototype
    anyway, so let's rename it to something clearer regarding what it does,
    and add a kernel-doc. ocelot_ifh_set_basic() should do.
    
    Investigation notes
    -------------------
    
    Debugging reveals that PTP event (aka those carrying timestamps, like
    Sync) frames injected into swp0.700 (but also swp1.700) hit the wire
    with two VLAN tags:
    
    00000000: 01 1b 19 00 00 00 00 01 02 03 04 05 81 00 02 bc
                                                  ~~~~~~~~~~~
    00000010: 81 00 02 bc 88 f7 00 12 00 2c 00 00 02 00 00 00
              ~~~~~~~~~~~
    00000020: 00 00 00 00 00 00 00 00 00 00 00 01 02 ff fe 03
    00000030: 04 05 00 01 00 04 00 00 00 00 00 00 00 00 00 00
    00000040: 00 00
    
    The second (unexpected) VLAN tag makes felix_check_xtr_pkt() ->
    ptp_classify_raw() fail to see these as PTP packets at the link
    partner's receiving end, and return PTP_CLASS_NONE (because the BPF
    classifier is not written to expect 2 VLAN tags).
    
    The reason why packets have 2 VLAN tags is because the transmission
    code treats VLAN incorrectly.
    
    Neither ocelot switchdev, nor felix DSA, declare the NETIF_F_HW_VLAN_CTAG_TX
    feature. Therefore, at xmit time, all VLANs should be in the skb head,
    and none should be in the hwaccel area. This is done by:
    
    static struct sk_buff *validate_xmit_vlan(struct sk_buff *skb,
                                              netdev_features_t features)
    {
            if (skb_vlan_tag_present(skb) &&
                !vlan_hw_offload_capable(features, skb->vlan_proto))
                    skb = __vlan_hwaccel_push_inside(skb);
            return skb;
    }
    
    But ocelot_port_inject_frame() handles things incorrectly:
    
            ocelot_ifh_port_set(ifh, port, rew_op, skb_vlan_tag_get(skb));
    
    void ocelot_ifh_port_set(struct sk_buff *skb, void *ifh, int port, u32 rew_op)
    {
            (...)
            if (vlan_tag)
                    ocelot_ifh_set_vlan_tci(ifh, vlan_tag);
            (...)
    }
    
    The way __vlan_hwaccel_push_inside() pushes the tag inside the skb head
    is by calling:
    
    static inline void __vlan_hwaccel_clear_tag(struct sk_buff *skb)
    {
            skb->vlan_present = 0;
    }
    
    which does _not_ zero out skb->vlan_tci as seen by skb_vlan_tag_get().
    This means that ocelot, when it calls skb_vlan_tag_get(), sees
    (and uses) a residual skb->vlan_tci, while the same VLAN tag is
    _already_ in the skb head.
    
    The trivial fix for double VLAN headers is to replace the content of
    ocelot_ifh_port_set() with:
    
            if (skb_vlan_tag_present(skb))
                    ocelot_ifh_set_vlan_tci(ifh, skb_vlan_tag_get(skb));
    
    but this would not be correct either, because, as mentioned,
    vlan_hw_offload_capable() is false for us, so we'd be inserting dead
    code and we'd always transmit packets with VID=0 in the injection frame
    header.
    
    I can't actually test the ocelot switchdev driver and rely exclusively
    on code inspection, but I don't think traffic from 8021q uppers has ever
    been injected properly, and not double-tagged. Thus I'm blaming the
    introduction of VLAN fields in the injection header - early driver code.
    
    As hinted at in the early conclusion, what we _want_ to happen for
    VLAN transmission was already described once in commit 5ca721c54d86
    ("net: dsa: tag_ocelot: set the classified VLAN during xmit").
    
    ocelot_xmit_get_vlan_info() intends to ensure that if the port through
    which we're transmitting is under a VLAN-aware bridge, the outer VLAN
    tag from the skb head is stripped from there and inserted into the
    injection frame header (so that the packet is processed in hardware
    through that actual VLAN). And in all other cases, the packet is sent
    with VID=0 in the injection frame header, since the port is VLAN-unaware
    and has logic to strip this VID on egress (making it invisible to the
    wire).
    
    Fixes: 08d02364b12f ("net: mscc: fix the injection header")
    Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: ngbe: Fix phy mode set to external phy [+ + +]

Author: Mengyuan Lou <mengyuanlou@net-swift.com>
Date:   Tue Aug 20 11:04:25 2024 +0800

    net: ngbe: Fix phy mode set to external phy
    
    commit f2916c83d746eb99f50f42c15cf4c47c2ea5f3b3 upstream.
    
    The MAC only has add the TX delay and it can not be modified.
    MAC and PHY are both set the TX delay cause transmission problems.
    So just disable TX delay in PHY, when use rgmii to attach to
    external phy, set PHY_INTERFACE_MODE_RGMII_RXID to phy drivers.
    And it is does not matter to internal phy.
    
    Fixes: bc2426d74aa3 ("net: ngbe: convert phylib to phylink")
    Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
    Cc: stable@vger.kernel.org # 6.3+
    Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
    Link: https://patch.msgid.link/E6759CF1387CF84C+20240820030425.93003-1-mengyuanlou@net-swift.com
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net: ovs: fix ovs_drop_reasons error [+ + +]

Author: Menglong Dong <menglong8.dong@gmail.com>
Date:   Wed Aug 21 20:32:52 2024 +0800

    net: ovs: fix ovs_drop_reasons error
    
    [ Upstream commit 57fb67783c4011581882f32e656d738da1f82042 ]
    
    There is something wrong with ovs_drop_reasons. ovs_drop_reasons[0] is
    "OVS_DROP_LAST_ACTION", but OVS_DROP_LAST_ACTION == __OVS_DROP_REASON + 1,
    which means that ovs_drop_reasons[1] should be "OVS_DROP_LAST_ACTION".
    
    And as Adrian tested, without the patch, adding flow to drop packets
    results in:
    
    drop at: do_execute_actions+0x197/0xb20 [openvsw (0xffffffffc0db6f97)
    origin: software
    input port ifindex: 8
    timestamp: Tue Aug 20 10:19:17 2024 859853461 nsec
    protocol: 0x800
    length: 98
    original length: 98
    drop reason: OVS_DROP_ACTION_ERROR
    
    With the patch, the same results in:
    
    drop at: do_execute_actions+0x197/0xb20 [openvsw (0xffffffffc0db6f97)
    origin: software
    input port ifindex: 8
    timestamp: Tue Aug 20 10:16:13 2024 475856608 nsec
    protocol: 0x800
    length: 98
    original length: 98
    drop reason: OVS_DROP_LAST_ACTION
    
    Fix this by initializing ovs_drop_reasons with index.
    
    Fixes: 9d802da40b7c ("net: openvswitch: add last-action drop reason")
    Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
    Tested-by: Adrian Moreno <amorenoz@redhat.com>
    Reviewed-by: Adrian Moreno <amorenoz@redhat.com>
    Link: https://patch.msgid.link/20240821123252.186305-1-dongml2@chinatelecom.cn
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: xilinx: axienet: Always disable promiscuous mode [+ + +]

Author: Sean Anderson <sean.anderson@linux.dev>
Date:   Thu Aug 22 11:40:55 2024 -0400

    net: xilinx: axienet: Always disable promiscuous mode
    
    [ Upstream commit 4ae738dfef2c0323752ab81786e2d298c9939321 ]
    
    If promiscuous mode is disabled when there are fewer than four multicast
    addresses, then it will not be reflected in the hardware. Fix this by
    always clearing the promiscuous mode flag even when we program multicast
    addresses.
    
    Fixes: 8a3b7a252dca ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver")
    Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Link: https://patch.msgid.link/20240822154059.1066595-2-sean.anderson@linux.dev
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: xilinx: axienet: Fix dangling multicast addresses [+ + +]

Author: Sean Anderson <sean.anderson@linux.dev>
Date:   Thu Aug 22 11:40:56 2024 -0400

    net: xilinx: axienet: Fix dangling multicast addresses
    
    [ Upstream commit 797a68c9de0f5a5447baf4bd3bb9c10a3993435b ]
    
    If a multicast address is removed but there are still some multicast
    addresses, that address would remain programmed into the frame filter.
    Fix this by explicitly setting the enable bit for each filter.
    
    Fixes: 8a3b7a252dca ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver")
    Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Link: https://patch.msgid.link/20240822154059.1066595-3-sean.anderson@linux.dev
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

netem: fix return value if duplicate enqueue fails [+ + +]

Author: Stephen Hemminger <stephen@networkplumber.org>
Date:   Mon Aug 19 10:56:45 2024 -0700

    netem: fix return value if duplicate enqueue fails
    
    [ Upstream commit c07ff8592d57ed258afee5a5e04991a48dbaf382 ]
    
    There is a bug in netem_enqueue() introduced by
    commit 5845f706388a ("net: netem: fix skb length BUG_ON in __skb_to_sgvec")
    that can lead to a use-after-free.
    
    This commit made netem_enqueue() always return NET_XMIT_SUCCESS
    when a packet is duplicated, which can cause the parent qdisc's q.qlen
    to be mistakenly incremented. When this happens qlen_notify() may be
    skipped on the parent during destruction, leaving a dangling pointer
    for some classful qdiscs like DRR.
    
    There are two ways for the bug happen:
    
    - If the duplicated packet is dropped by rootq->enqueue() and then
      the original packet is also dropped.
    - If rootq->enqueue() sends the duplicated packet to a different qdisc
      and the original packet is dropped.
    
    In both cases NET_XMIT_SUCCESS is returned even though no packets
    are enqueued at the netem qdisc.
    
    The fix is to defer the enqueue of the duplicate packet until after
    the original packet has been guaranteed to return NET_XMIT_SUCCESS.
    
    Fixes: 5845f706388a ("net: netem: fix skb length BUG_ON in __skb_to_sgvec")
    Reported-by: Budimir Markovic <markovicbudimir@gmail.com>
    Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Link: https://patch.msgid.link/20240819175753.5151-1-stephen@networkplumber.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

netfilter: allow ipv6 fragments to arrive on different devices [+ + +]

Author: Tom Hughes <tom@compton.nu>
Date:   Tue Aug 6 12:40:52 2024 +0100

    netfilter: allow ipv6 fragments to arrive on different devices
    
    [ Upstream commit 3cd740b985963f874a1a094f1969e998b9d05554 ]
    
    Commit 264640fc2c5f4 ("ipv6: distinguish frag queues by device
    for multicast and link-local packets") modified the ipv6 fragment
    reassembly logic to distinguish frag queues by device for multicast
    and link-local packets but in fact only the main reassembly code
    limits the use of the device to those address types and the netfilter
    reassembly code uses the device for all packets.
    
    This means that if fragments of a packet arrive on different interfaces
    then netfilter will fail to reassemble them and the fragments will be
    expired without going any further through the filters.
    
    Fixes: 648700f76b03 ("inet: frags: use rhashtables for reassembly units")
    Signed-off-by: Tom Hughes <tom@compton.nu>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

netfilter: flowtable: initialise extack before use [+ + +]

Author: Donald Hunter <donald.hunter@gmail.com>
Date:   Tue Aug 6 17:16:37 2024 +0100

    netfilter: flowtable: initialise extack before use
    
    [ Upstream commit e9767137308daf906496613fd879808a07f006a2 ]
    
    Fix missing initialisation of extack in flow offload.
    
    Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support")
    Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

netfilter: flowtable: validate vlan header [+ + +]

Author: Pablo Neira Ayuso <pablo@netfilter.org>
Date:   Tue Aug 13 12:39:46 2024 +0200

    netfilter: flowtable: validate vlan header
    
    [ Upstream commit 6ea14ccb60c8ab829349979b22b58a941ec4a3ee ]
    
    Ensure there is sufficient room to access the protocol field of the
    VLAN header, validate it once before the flowtable lookup.
    
    =====================================================
    BUG: KMSAN: uninit-value in nf_flow_offload_inet_hook+0x45a/0x5f0 net/netfilter/nf_flow_table_inet.c:32
     nf_flow_offload_inet_hook+0x45a/0x5f0 net/netfilter/nf_flow_table_inet.c:32
     nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline]
     nf_hook_slow+0xf4/0x400 net/netfilter/core.c:626
     nf_hook_ingress include/linux/netfilter_netdev.h:34 [inline]
     nf_ingress net/core/dev.c:5440 [inline]
    
    Fixes: 4cd91f7c290f ("netfilter: flowtable: add vlan support")
    Reported-by: syzbot+8407d9bb88cd4c6bf61a@syzkaller.appspotmail.com
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

netfilter: nf_queue: drop packets with cloned unconfirmed conntracks [+ + +]

Author: Florian Westphal <fw@strlen.de>
Date:   Wed Aug 7 21:28:41 2024 +0200

    netfilter: nf_queue: drop packets with cloned unconfirmed conntracks
    
    [ Upstream commit 7d8dc1c7be8d3509e8f5164dd5df64c8e34d7eeb ]
    
    Conntrack assumes an unconfirmed entry (not yet committed to global hash
    table) has a refcount of 1 and is not visible to other cores.
    
    With multicast forwarding this assumption breaks down because such
    skbs get cloned after being picked up, i.e.  ct->use refcount is > 1.
    
    Likewise, bridge netfilter will clone broad/mutlicast frames and
    all frames in case they need to be flood-forwarded during learning
    phase.
    
    For ip multicast forwarding or plain bridge flood-forward this will
    "work" because packets don't leave softirq and are implicitly
    serialized.
    
    With nfqueue this no longer holds true, the packets get queued
    and can be reinjected in arbitrary ways.
    
    Disable this feature, I see no other solution.
    
    After this patch, nfqueue cannot queue packets except the last
    multicast/broadcast packet.
    
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Florian Westphal <fw@strlen.de>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

netfilter: nf_tables: Add locking for NFT_MSG_GETOBJ_RESET requests [+ + +]

Author: Phil Sutter <phil@nwl.cc>
Date:   Fri Aug 9 15:07:32 2024 +0200

    netfilter: nf_tables: Add locking for NFT_MSG_GETOBJ_RESET requests
    
    [ Upstream commit bd662c4218f9648e888bebde9468146965f3f8a0 ]
    
    Objects' dump callbacks are not concurrency-safe per-se with reset bit
    set. If two CPUs perform a reset at the same time, at least counter and
    quota objects suffer from value underrun.
    
    Prevent this by introducing dedicated locking callbacks for nfnetlink
    and the asynchronous dump handling to serialize access.
    
    Fixes: 43da04a593d8 ("netfilter: nf_tables: atomic dump and reset for stateful objects")
    Signed-off-by: Phil Sutter <phil@nwl.cc>
    Reviewed-by: Florian Westphal <fw@strlen.de>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

netfilter: nf_tables: Audit log dump reset after the fact [+ + +]

Author: Phil Sutter <phil@nwl.cc>
Date:   Fri Aug 9 15:07:30 2024 +0200

    netfilter: nf_tables: Audit log dump reset after the fact
    
    [ Upstream commit e0b6648b0446e59522819c75ba1dcb09e68d3e94 ]
    
    In theory, dumpreset may fail and invalidate the preceeding log message.
    Fix this and use the occasion to prepare for object reset locking, which
    benefits from a few unrelated changes:
    
    * Add an early call to nfnetlink_unicast if not resetting which
      effectively skips the audit logging but also unindents it.
    * Extract the table's name from the netlink attribute (which is verified
      via earlier table lookup) to not rely upon validity of the looked up
      table pointer.
    * Do not use local variable family, it will vanish.
    
    Fixes: 8e6cf365e1d5 ("audit: log nftables configuration change events")
    Signed-off-by: Phil Sutter <phil@nwl.cc>
    Reviewed-by: Florian Westphal <fw@strlen.de>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

netfilter: nf_tables: Introduce nf_tables_getobj_single [+ + +]

Author: Phil Sutter <phil@nwl.cc>
Date:   Fri Aug 9 15:07:31 2024 +0200

    netfilter: nf_tables: Introduce nf_tables_getobj_single
    
    [ Upstream commit 69fc3e9e90f1afc11f4015e6b75d18ab9acee348 ]
    
    Outsource the reply skb preparation for non-dump getrule requests into a
    distinct function. Prep work for object reset locking.
    
    Signed-off-by: Phil Sutter <phil@nwl.cc>
    Reviewed-by: Florian Westphal <fw@strlen.de>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
    Stable-dep-of: bd662c4218f9 ("netfilter: nf_tables: Add locking for NFT_MSG_GETOBJ_RESET requests")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

netfilter: nfnetlink: Initialise extack before use in ACKs [+ + +]

Author: Donald Hunter <donald.hunter@gmail.com>
Date:   Tue Aug 6 16:43:24 2024 +0100

    netfilter: nfnetlink: Initialise extack before use in ACKs
    
    [ Upstream commit d1a7b382a9d3f0f3e5a80e0be2991c075fa4f618 ]
    
    Add missing extack initialisation when ACKing BATCH_BEGIN and BATCH_END.
    
    Fixes: bf2ac490d28c ("netfilter: nfnetlink: Handle ACK flags for batch messages")
    Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

netfilter: nft_counter: Disable BH in nft_counter_offload_stats(). [+ + +]

Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Tue Aug 20 09:54:30 2024 +0200

    netfilter: nft_counter: Disable BH in nft_counter_offload_stats().
    
    [ Upstream commit 1eacdd71b3436b54d5fc8218c4bb0187d92a6892 ]
    
    The sequence counter nft_counter_seq is a per-CPU counter. There is no
    lock associated with it. nft_counter_do_eval() is using the same counter
    and disables BH which suggest that it can be invoked from a softirq.
    This in turn means that nft_counter_offload_stats(), which disables only
    preemption, can be interrupted by nft_counter_do_eval() leading to two
    writer for one seqcount_t.
    This can lead to loosing stats or reading statistics while they are
    updated.
    
    Disable BH during stats update in nft_counter_offload_stats() to ensure
    one writer at a time.
    
    Fixes: b72920f6e4a9d ("netfilter: nftables: counter hardware offload support")
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Reviewed-by: Florian Westphal <fw@strlen.de>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

netfilter: nft_counter: Synchronize nft_counter_reset() against reader. [+ + +]

Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Tue Aug 20 09:54:31 2024 +0200

    netfilter: nft_counter: Synchronize nft_counter_reset() against reader.
    
    [ Upstream commit a0b39e2dc7017ac667b70bdeee5293e410fab2fb ]
    
    nft_counter_reset() resets the counter by subtracting the previously
    retrieved value from the counter. This is a write operation on the
    counter and as such it requires to be performed with a write sequence of
    nft_counter_seq to serialize against its possible reader.
    
    Update the packets/ bytes within write-sequence of nft_counter_seq.
    
    Fixes: d84701ecbcd6a ("netfilter: nft_counter: rework atomic dump and reset")
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Reviewed-by: Florian Westphal <fw@strlen.de>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

netfs, ceph: Revert "netfs: Remove deprecated use of PG_private_2 as a second writeback flag" [+ + +]

Author: David Howells <dhowells@redhat.com>
Date:   Tue Jul 30 17:01:40 2024 +0100

    netfs, ceph: Revert "netfs: Remove deprecated use of PG_private_2 as a second writeback flag"
    
    commit 8e5ced7804cb9184c4a23f8054551240562a8eda upstream.
    
    This reverts commit ae678317b95e760607c7b20b97c9cd4ca9ed6e1a.
    
    Revert the patch that removes the deprecated use of PG_private_2 in
    netfslib for the moment as Ceph is actually still using this to track
    data copied to the cache.
    
    Fixes: ae678317b95e ("netfs: Remove deprecated use of PG_private_2 as a second writeback flag")
    Reported-by: Max Kellermann <max.kellermann@ionos.com>
    Signed-off-by: David Howells <dhowells@redhat.com>
    cc: Ilya Dryomov <idryomov@gmail.com>
    cc: Xiubo Li <xiubli@redhat.com>
    cc: Jeff Layton <jlayton@kernel.org>
    cc: Matthew Wilcox <willy@infradead.org>
    cc: ceph-devel@vger.kernel.org
    cc: netfs@lists.linux.dev
    cc: linux-fsdevel@vger.kernel.org
    cc: linux-mm@kvack.org
    https: //lore.kernel.org/r/3575457.1722355300@warthog.procyon.org.uk
    Signed-off-by: Christian Brauner <brauner@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

netfs: Fault in smaller chunks for non-large folio mappings [+ + +]

Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Mon May 27 21:17:32 2024 +0100

    netfs: Fault in smaller chunks for non-large folio mappings
    
    [ Upstream commit 98055bc3595500bcf2126b93b1595354bdb86a66 ]
    
    As in commit 4e527d5841e2 ("iomap: fault in smaller chunks for non-large
    folio mappings"), we can see a performance loss for filesystems
    which have not yet been converted to large folios.
    
    Fixes: c38f4e96e605 ("netfs: Provide func to copy data to pagecache for buffered write")
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Link: https://lore.kernel.org/r/20240527201735.1898381-1-willy@infradead.org
    Reviewed-by: Jeff Layton <jlayton@kernel.org>
    Signed-off-by: Christian Brauner <brauner@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

nouveau/firmware: use dma non-coherent allocator [+ + +]

Author: Dave Airlie <airlied@redhat.com>
Date:   Fri Aug 16 06:19:23 2024 +1000

    nouveau/firmware: use dma non-coherent allocator
    
    commit 9b340aeb26d50e9a9ec99599e2a39b035fac978e upstream.
    
    Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
    BUG() on startup, when the iommu is enabled:
    
    kernel BUG at include/linux/scatterlist.h:187!
    invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
    CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
    Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
    RIP: 0010:sg_init_one+0x85/0xa0
    Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
    24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
    0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
    RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
    RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
    RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
    RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
    R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
    R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
    FS:  00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
    Call Trace:
     <TASK>
     ? die+0x36/0x90
     ? do_trap+0xdd/0x100
     ? sg_init_one+0x85/0xa0
     ? do_error_trap+0x65/0x80
     ? sg_init_one+0x85/0xa0
     ? exc_invalid_op+0x50/0x70
     ? sg_init_one+0x85/0xa0
     ? asm_exc_invalid_op+0x1a/0x20
     ? sg_init_one+0x85/0xa0
     nvkm_firmware_ctor+0x14a/0x250 [nouveau]
     nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
     ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
     r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
     ? srso_return_thunk+0x5/0x5f
     ? srso_return_thunk+0x5/0x5f
     ? nvkm_udevice_new+0x95/0x140 [nouveau]
     ? srso_return_thunk+0x5/0x5f
     ? srso_return_thunk+0x5/0x5f
     ? ktime_get+0x47/0xb0
    
    Fix this by using the non-coherent allocator instead, I think there
    might be a better answer to this, but it involve ripping up some of
    APIs using sg lists.
    
    Cc: stable@vger.kernel.org
    Fixes: 2541626cfb79 ("drm/nouveau/acr: use common falcon HS FW code for ACR FWs")
    Signed-off-by: Dave Airlie <airlied@redhat.com>
    Signed-off-by: Danilo Krummrich <dakr@kernel.org>
    Link: https://patchwork.freedesktop.org/patch/msgid/20240815201923.632803-1-airlied@gmail.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nvme: move stopping keep-alive into nvme_uninit_ctrl() [+ + +]

Author: Ming Lei <ming.lei@redhat.com>
Date:   Tue Aug 13 09:35:27 2024 +0800

    nvme: move stopping keep-alive into nvme_uninit_ctrl()
    
    [ Upstream commit a54a93d0e3599b05856971734e15418ac551a14c ]
    
    Commit 4733b65d82bd ("nvme: start keep-alive after admin queue setup")
    moves starting keep-alive from nvme_start_ctrl() into
    nvme_init_ctrl_finish(), but don't move stopping keep-alive into
    nvme_uninit_ctrl(), so keep-alive work can be started and keep pending
    after failing to start controller, finally use-after-free is triggered if
    nvme host driver is unloaded.
    
    This patch fixes kernel panic when running nvme/004 in case that connection
    failure is triggered, by moving stopping keep-alive into nvme_uninit_ctrl().
    
    This way is reasonable because keep-alive is now started in
    nvme_init_ctrl_finish().
    
    Fixes: 3af755a46881 ("nvme: move nvme_stop_keep_alive() back to original position")
    Cc: Hannes Reinecke <hare@suse.de>
    Cc: Mark O'Donovan <shiftee@posteo.net>
    Reported-by: Changhui Zhong <czhong@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
    Reviewed-by: Hannes Reinecke <hare@suse.de>
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    Signed-off-by: Keith Busch <kbusch@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

octeontx2-af: Fix CPT AF register offset calculation [+ + +]

Author: Bharat Bhushan <bbhushan2@marvell.com>
Date:   Wed Aug 21 12:35:58 2024 +0530

    octeontx2-af: Fix CPT AF register offset calculation
    
    [ Upstream commit af688a99eb1fc7ef69774665d61e6be51cea627a ]
    
    Some CPT AF registers are per LF and others are global. Translation
    of PF/VF local LF slot number to actual LF slot number is required
    only for accessing perf LF registers. CPT AF global registers access
    do not require any LF slot number. Also, there is no reason CPT
    PF/VF to know actual lf's register offset.
    
    Without this fix microcode loading will fail, VFs cannot be created
    and hardware is not usable.
    
    Fixes: bc35e28af789 ("octeontx2-af: replace cpt slot with lf id on reg write")
    Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Link: https://patch.msgid.link/20240821070558.1020101-1-bbhushan2@marvell.com
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

perf/bpf: Don't call bpf_overflow_handler() for tracing events [+ + +]

Author: Kyle Huey <me@kylehuey.com>
Date:   Tue Aug 13 15:17:27 2024 +0000

    perf/bpf: Don't call bpf_overflow_handler() for tracing events
    
    commit 100bff23818eb61751ed05d64a7df36ce9728a4d upstream.
    
    The regressing commit is new in 6.10. It assumed that anytime event->prog
    is set bpf_overflow_handler() should be invoked to execute the attached bpf
    program. This assumption is false for tracing events, and as a result the
    regressing commit broke bpftrace by invoking the bpf handler with garbage
    inputs on overflow.
    
    Prior to the regression the overflow handlers formed a chain (of length 0,
    1, or 2) and perf_event_set_bpf_handler() (the !tracing case) added
    bpf_overflow_handler() to that chain, while perf_event_attach_bpf_prog()
    (the tracing case) did not. Both set event->prog. The chain of overflow
    handlers was replaced by a single overflow handler slot and a fixed call to
    bpf_overflow_handler() when appropriate. This modifies the condition there
    to check event->prog->type == BPF_PROG_TYPE_PERF_EVENT, restoring the
    previous behavior and fixing bpftrace.
    
    Signed-off-by: Kyle Huey <khuey@kylehuey.com>
    Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
    Reported-by: Joe Damato <jdamato@fastly.com>
    Closes: https://lore.kernel.org/lkml/ZpFfocvyF3KHaSzF@LQ3V64L9R2/
    Fixes: f11f10bfa1ca ("perf/bpf: Call BPF handler directly, not through overflow machinery")
    Cc: stable@vger.kernel.org
    Tested-by: Joe Damato <jdamato@fastly.com> # bpftrace
    Acked-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/r/20240813151727.28797-1-jdamato@fastly.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

pidfd: prevent creation of pidfds for kthreads [+ + +]

Author: Christian Brauner <brauner@kernel.org>
Date:   Wed Jul 31 12:01:12 2024 +0200

    pidfd: prevent creation of pidfds for kthreads
    
    commit 3b5bbe798b2451820e74243b738268f51901e7d0 upstream.
    
    It's currently possible to create pidfds for kthreads but it is unclear
    what that is supposed to mean. Until we have use-cases for it and we
    figured out what behavior we want block the creation of pidfds for
    kthreads.
    
    Link: https://lore.kernel.org/r/20240731-gleis-mehreinnahmen-6bbadd128383@brauner
    Fixes: 32fcb426ec00 ("pid: add pidfd_open()")
    Cc: stable@vger.kernel.org
    Signed-off-by: Christian Brauner <brauner@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

platform/surface: aggregator: Fix warning when controller is destroyed in probe [+ + +]

Author: Maximilian Luz <luzmaximilian@gmail.com>
Date:   Sun Aug 11 14:46:44 2024 +0200

    platform/surface: aggregator: Fix warning when controller is destroyed in probe
    
    [ Upstream commit bc923d594db21bee0ead128eb4bb78f7e77467a4 ]
    
    There is a small window in ssam_serial_hub_probe() where the controller
    is initialized but has not been started yet. Specifically, between
    ssam_controller_init() and ssam_controller_start(). Any failure in this
    window, for example caused by a failure of serdev_device_open(),
    currently results in an incorrect warning being emitted.
    
    In particular, any failure in this window results in the controller
    being destroyed via ssam_controller_destroy(). This function checks the
    state of the controller and, in an attempt to validate that the
    controller has been cleanly shut down before we try and deallocate any
    resources, emits a warning if that state is not SSAM_CONTROLLER_STOPPED.
    
    However, since we have only just initialized the controller and have not
    yet started it, its state is SSAM_CONTROLLER_INITIALIZED. Note that this
    is the only point at which the controller has this state, as it will
    change after we start the controller with ssam_controller_start() and
    never revert back. Further, at this point no communication has taken
    place and the sender and receiver threads have not been started yet (and
    we may not even have an open serdev device either).
    
    Therefore, it is perfectly safe to call ssam_controller_destroy() with a
    state of SSAM_CONTROLLER_INITIALIZED. This, however, means that the
    warning currently being emitted is incorrect. Fix it by extending the
    check.
    
    Fixes: c167b9c7e3d6 ("platform/surface: Add Surface Aggregator subsystem")
    Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com>
    Link: https://lore.kernel.org/r/20240811124645.246016-1-luzmaximilian@gmail.com
    Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
    Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

platform/x86: dell-uart-backlight: Use acpi_video_get_backlight_type() [+ + +]

Author: Hans de Goede <hdegoede@redhat.com>
Date:   Wed Aug 14 21:01:58 2024 +0200

    platform/x86: dell-uart-backlight: Use acpi_video_get_backlight_type()
    
    commit b5f0943001339c4d324a1af10470ce0bdd79f966 upstream.
    
    The dell-uart-backlight driver supports backlight control on Dell All In
    One (AIO) models using a backlight controller board connected to an UART.
    
    In DSDT this uart port will be defined as:
    
       Name (_HID, "DELL0501")
       Name (_CID, EisaId ("PNP0501")
    
    Now the first AIO has turned up which has not only the DSDT bits for this,
    but also an actual controller attached to the UART, yet it is not using
    this controller for backlight control.
    
    Use the acpi_video_get_backlight_type() function from the ACPI video-detect
    code to check if the dell-uart-backlight driver should actually be used.
    This allows reusing the existing ACPI video-detect infra to override
    the backlight control method on the commandline or with DMI quirks.
    
    Fixes: 484bae9e4d6a ("platform/x86: Add new Dell UART backlight driver")
    Cc: All applicable <stable@vger.kernel.org>
    Signed-off-by: Hans de Goede <hdegoede@redhat.com>
    Reviewed-by: Andy Shevchenko <andy@kernel.org>
    Link: https://patch.msgid.link/20240814190159.15650-3-hdegoede@redhat.com
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

platform/x86: ISST: Fix return value on last invalid resource [+ + +]

Author: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Date:   Fri Aug 16 09:36:26 2024 -0700

    platform/x86: ISST: Fix return value on last invalid resource
    
    commit 46ee21e9f59205e54943dfe51b2dc8a9352ca37d upstream.
    
    When only the last resource is invalid, tpmi_sst_dev_add() is returing
    error even if there are other valid resources before. This function
    should return error when there are no valid resources.
    
    Here tpmi_sst_dev_add() is returning "ret" variable. But this "ret"
    variable contains the failure status of last call to sst_main(), which
    failed for the invalid resource. But there may be other valid resources
    before the last entry.
    
    To address this, do not update "ret" variable for sst_main() return
    status.
    
    If there are no valid resources, it is already checked for by !inst
    below the loop and -ENODEV is returned.
    
    Fixes: 9d1d36268f3d ("platform/x86: ISST: Support partitioned systems")
    Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
    Cc: stable@vger.kernel.org # 6.10+
    Link: https://lore.kernel.org/r/20240816163626.415762-1-srinivas.pandruvada@linux.intel.com
    Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
    Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

pmdomain: imx: scu-pd: Remove duplicated clocks [+ + +]

Author: Alexander Stein <alexander.stein@ew.tq-group.com>
Date:   Wed Jul 17 10:03:33 2024 +0200

    pmdomain: imx: scu-pd: Remove duplicated clocks
    
    commit 50359c9c3cb3e55e840e3485f5ee37da5b2b16b6 upstream.
    
    These clocks are already added to the list. Remove the duplicates ones.
    
    Fixes: a67d780720ff ("genpd: imx: scu-pd: add more PDs")
    Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20240717080334.2210988-1-alexander.stein@ew.tq-group.com
    Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

pmdomain: imx: wait SSAR when i.MX93 power domain on [+ + +]

Author: Peng Fan <peng.fan@nxp.com>
Date:   Wed Aug 14 20:47:40 2024 +0800

    pmdomain: imx: wait SSAR when i.MX93 power domain on
    
    commit 52dd070c62e4ae2b5e7411b920e3f7a64235ecfb upstream.
    
    With "quiet" set in bootargs, there is power domain failure:
    "imx93_power_domain 44462400.power-domain: pd_off timeout: name:
     44462400.power-domain, stat: 4"
    
    The current power on opertation takes ISO state as power on finished
    flag, but it is wrong. Before powering on operation really finishes,
    powering off comes and powering off will never finish because the last
    powering on still not finishes, so the following powering off actually
    not trigger hardware state machine to run. SSAR is the last step when
    powering on a domain, so need to wait SSAR done when powering on.
    
    Since EdgeLock Enclave(ELE) handshake is involved in the flow, enlarge
    the waiting time to 10ms for both on and off to avoid timeout.
    
    Cc: stable@vger.kernel.org
    Fixes: 0a0f7cc25d4a ("soc: imx: add i.MX93 SRC power domain driver")
    Reviewed-by: Jacky Bai <ping.bai@nxp.com>
    Signed-off-by: Peng Fan <peng.fan@nxp.com>
    Link: https://lore.kernel.org/r/20240814124740.2778952-1-peng.fan@oss.nxp.com
    Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

powerpc/topology: Check if a core is online [+ + +]

Author: Nysal Jan K.A <nysal@linux.ibm.com>
Date:   Wed Jul 31 08:31:13 2024 +0530

    powerpc/topology: Check if a core is online
    
    [ Upstream commit 227bbaabe64b6f9cd98aa051454c1d4a194a8c6a ]
    
    topology_is_core_online() checks if the core a CPU belongs to
    is online. The core is online if at least one of the sibling
    CPUs is online. The first CPU of an online core is also online
    in the common case, so this should be fairly quick.
    
    Fixes: 73c58e7e1412 ("powerpc: Add HOTPLUG_SMT support")
    Signed-off-by: Nysal Jan K.A <nysal@linux.ibm.com>
    Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://msgid.link/20240731030126.956210-3-nysal@linux.ibm.com
    Signed-off-by: Sasha Levin <sashal@kernel.org>

printk/panic: Allow cpu backtraces to be written into ringbuffer during panic [+ + +]

Author: Ryo Takakura <takakura@valinux.co.jp>
Date:   Mon Aug 12 16:27:03 2024 +0900

    printk/panic: Allow cpu backtraces to be written into ringbuffer during panic
    
    [ Upstream commit bcc954c6caba01fca143162d5fbb90e46aa1ad80 ]
    
    commit 779dbc2e78d7 ("printk: Avoid non-panic CPUs writing
    to ringbuffer") disabled non-panic CPUs to further write messages to
    ringbuffer after panicked.
    
    Since the commit, non-panicked CPU's are not allowed to write to
    ring buffer after panicked and CPU backtrace which is triggered
    after panicked to sample non-panicked CPUs' backtrace no longer
    serves its function as it has nothing to print.
    
    Fix the issue by allowing non-panicked CPUs to write into ringbuffer
    while CPU backtrace is in flight.
    
    Fixes: 779dbc2e78d7 ("printk: Avoid non-panic CPUs writing to ringbuffer")
    Signed-off-by: Ryo Takakura <takakura@valinux.co.jp>
    Reviewed-by: Petr Mladek <pmladek@suse.com>
    Link: https://lore.kernel.org/r/20240812072703.339690-1-takakura@valinux.co.jp
    Signed-off-by: Petr Mladek <pmladek@suse.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

Revert "ACPI: EC: Evaluate orphan _REG under EC device" [+ + +]

Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Date:   Mon Aug 12 15:08:04 2024 +0200

    Revert "ACPI: EC: Evaluate orphan _REG under EC device"
    
    commit 779bac9994452f6a894524f70c00cfb0cd4b6364 upstream.
    
    This reverts commit 0e6b6dedf168 ("Revert "ACPI: EC: Evaluate orphan
    _REG under EC device") because the problem addressed by it will be
    addressed differently in what follows.
    
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Reviewed-by: Hans de Goede <hdegoede@redhat.com>
    Cc: All applicable <stable@vger.kernel.org>
    Link: https://patch.msgid.link/3236716.5fSG56mABF@rjwysocki.net
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Revert "misc: fastrpc: Restrict untrusted app to attach to privileged PD" [+ + +]

Author: Griffin Kroah-Hartman <griffin@kroah.com>
Date:   Thu Aug 15 11:49:20 2024 +0200

    Revert "misc: fastrpc: Restrict untrusted app to attach to privileged PD"
    
    commit 9bb5e74b2bf88fbb024bb15ded3b011e02c673be upstream.
    
    This reverts commit bab2f5e8fd5d2f759db26b78d9db57412888f187.
    
    Joel reported that this commit breaks userspace and stops sensors in
    SDM845 from working. Also breaks other qcom SoC devices running postmarketOS.
    
    Cc: stable <stable@kernel.org>
    Cc: Ekansh Gupta <quic_ekangupt@quicinc.com>
    Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
    Reported-by: Joel Selvaraj <joelselvaraj.oss@gmail.com>
    Link: https://lore.kernel.org/r/9a9f5646-a554-4b65-8122-d212bb665c81@umsystem.edu
    Signed-off-by: Griffin Kroah-Hartman <griffin@kroah.com>
    Acked-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
    Fixes: bab2f5e8fd5d ("misc: fastrpc: Restrict untrusted app to attach to privileged PD")
    Link: https://lore.kernel.org/r/20240815094920.8242-1-griffin@kroah.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Revert "pidfd: prevent creation of pidfds for kthreads" [+ + +]

Author: Christian Brauner <brauner@kernel.org>
Date:   Mon Aug 19 10:38:23 2024 +0200

    Revert "pidfd: prevent creation of pidfds for kthreads"
    
    commit 232590ea7fc125986a526e03081b98e5783f70d2 upstream.
    
    This reverts commit 3b5bbe798b2451820e74243b738268f51901e7d0.
    
    Eric reported that systemd-shutdown gets broken by blocking the creating
    of pidfds for kthreads as older versions seems to rely on being able to
    create a pidfd for any process in /proc.
    
    Reported-by: Eric Biggers <ebiggers@kernel.org>
    Link: https://lore.kernel.org/r/20240818035818.GA1929@sol.localdomain
    Signed-off-by: Christian Brauner <brauner@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Revert "serial: 8250_omap: Set the console genpd always on if no console suspend" [+ + +]

Author: Griffin Kroah-Hartman <griffin@kroah.com>
Date:   Wed Aug 14 13:17:47 2024 +0200

    Revert "serial: 8250_omap: Set the console genpd always on if no console suspend"
    
    commit 0863bffda1131fd2fa9c05b653ad9ee3d8db127e upstream.
    
    This reverts commit 68e6939ea9ec3d6579eadeab16060339cdeaf940.
    
    Kevin reported that this causes a crash during suspend on platforms that
    dont use PM domains.
    
    Link: https://lore.kernel.org/r/7ha5hgpchq.fsf@baylibre.com
    Cc: Thomas Richard <thomas.richard@bootlin.com>
    Fixes: 68e6939ea9ec ("serial: 8250_omap: Set the console genpd always on if no console suspend")
    Cc: stable <stable@kernel.org>
    Reported-by: Kevin Hilman <khilman@kernel.org>
    Signed-off-by: Griffin Kroah-Hartman <griffin@kroah.com>
    Link: https://lore.kernel.org/r/20240814111747.82371-1-griffin@kroah.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Revert "usb: typec: tcpm: clear pd_event queue in PORT_RESET" [+ + +]

Author: Xu Yang <xu.yang_2@nxp.com>
Date:   Fri Aug 9 19:29:01 2024 +0800

    Revert "usb: typec: tcpm: clear pd_event queue in PORT_RESET"
    
    commit 21ea1ce37fc267dc45fe27517bbde926211683df upstream.
    
    This reverts commit bf20c69cf3cf9c6445c4925dd9a8a6ca1b78bfdf.
    
    During tcpm_init() stage, if the VBUS is still present after
    tcpm_reset_port(), then we assume that VBUS will off and goto safe0v
    after a specific discharge time. Following a TCPM_VBUS_EVENT event if
    VBUS reach to off state. TCPM_VBUS_EVENT event may be set during
    PORT_RESET handling stage. If pd_events reset to 0 after TCPM_VBUS_EVENT
    set, we will lost this VBUS event. Then the port state machine may stuck
    at one state.
    
    Before:
    
    [    2.570172] pending state change PORT_RESET -> PORT_RESET_WAIT_OFF @ 100 ms [rev1 NONE_AMS]
    [    2.570179] state change PORT_RESET -> PORT_RESET_WAIT_OFF [delayed 100 ms]
    [    2.570182] pending state change PORT_RESET_WAIT_OFF -> SNK_UNATTACHED @ 920 ms [rev1 NONE_AMS]
    [    3.490213] state change PORT_RESET_WAIT_OFF -> SNK_UNATTACHED [delayed 920 ms]
    [    3.490220] Start toggling
    [    3.546050] CC1: 0 -> 0, CC2: 0 -> 2 [state TOGGLING, polarity 0, connected]
    [    3.546057] state change TOGGLING -> SRC_ATTACH_WAIT [rev1 NONE_AMS]
    
    After revert this patch, we can see VBUS off event and the port will goto
    expected state.
    
    [    2.441992] pending state change PORT_RESET -> PORT_RESET_WAIT_OFF @ 100 ms [rev1 NONE_AMS]
    [    2.441999] state change PORT_RESET -> PORT_RESET_WAIT_OFF [delayed 100 ms]
    [    2.442002] pending state change PORT_RESET_WAIT_OFF -> SNK_UNATTACHED @ 920 ms [rev1 NONE_AMS]
    [    2.442122] VBUS off
    [    2.442125] state change PORT_RESET_WAIT_OFF -> SNK_UNATTACHED [rev1 NONE_AMS]
    [    2.442127] VBUS VSAFE0V
    [    2.442351] CC1: 0 -> 0, CC2: 0 -> 0 [state SNK_UNATTACHED, polarity 0, disconnected]
    [    2.442357] Start toggling
    [    2.491850] CC1: 0 -> 0, CC2: 0 -> 2 [state TOGGLING, polarity 0, connected]
    [    2.491858] state change TOGGLING -> SRC_ATTACH_WAIT [rev1 NONE_AMS]
    [    2.491863] pending state change SRC_ATTACH_WAIT -> SNK_TRY @ 200 ms [rev1 NONE_AMS]
    [    2.691905] state change SRC_ATTACH_WAIT -> SNK_TRY [delayed 200 ms]
    
    Fixes: bf20c69cf3cf ("usb: typec: tcpm: clear pd_event queue in PORT_RESET")
    Cc: stable@vger.kernel.org
    Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
    Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
    Link: https://lore.kernel.org/r/20240809112901.535072-1-xu.yang_2@nxp.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

riscv: change XIP's kernel_map.size to be size of the entire kernel [+ + +]

Author: Nam Cao <namcao@linutronix.de>
Date:   Wed May 8 21:19:17 2024 +0200

    riscv: change XIP's kernel_map.size to be size of the entire kernel
    
    commit 57d76bc51fd80824bcc0c84a5b5ec944f1b51edd upstream.
    
    With XIP kernel, kernel_map.size is set to be only the size of data part of
    the kernel. This is inconsistent with "normal" kernel, who sets it to be
    the size of the entire kernel.
    
    More importantly, XIP kernel fails to boot if CONFIG_DEBUG_VIRTUAL is
    enabled, because there are checks on virtual addresses with the assumption
    that kernel_map.size is the size of the entire kernel (these checks are in
    arch/riscv/mm/physaddr.c).
    
    Change XIP's kernel_map.size to be the size of the entire kernel.
    
    Signed-off-by: Nam Cao <namcao@linutronix.de>
    Cc: <stable@vger.kernel.org> # v6.1+
    Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
    Link: https://lore.kernel.org/r/20240508191917.2892064-1-namcao@linutronix.de
    Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

riscv: entry: always initialize regs->a0 to -ENOSYS [+ + +]

Author: Celeste Liu <coelacanthushex@gmail.com>
Date:   Thu Jun 27 22:23:39 2024 +0800

    riscv: entry: always initialize regs->a0 to -ENOSYS
    
    commit 61119394631f219e23ce98bcc3eb993a64a8ea64 upstream.
    
    Otherwise when the tracer changes syscall number to -1, the kernel fails
    to initialize a0 with -ENOSYS and subsequently fails to return the error
    code of the failed syscall to userspace. For example, it will break
    strace syscall tampering.
    
    Fixes: 52449c17bdd1 ("riscv: entry: set a0 = -ENOSYS only when syscall != -1")
    Reported-by: "Dmitry V. Levin" <ldv@strace.io>
    Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: Celeste Liu <CoelacanthusHex@gmail.com>
    Link: https://lore.kernel.org/r/20240627142338.5114-2-CoelacanthusHex@gmail.com
    Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

rtla/osnoise: Prevent NULL dereference in error handling [+ + +]

Author: Dan Carpenter <dan.carpenter@linaro.org>
Date:   Fri Aug 9 15:34:30 2024 +0300

    rtla/osnoise: Prevent NULL dereference in error handling
    
    commit 90574d2a675947858b47008df8d07f75ea50d0d0 upstream.
    
    If the "tool->data" allocation fails then there is no need to call
    osnoise_free_top() and, in fact, doing so will lead to a NULL dereference.
    
    Cc: stable@vger.kernel.org
    Cc: John Kacur <jkacur@redhat.com>
    Cc: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com>
    Cc: Clark Williams <williams@redhat.com>
    Fixes: 1eceb2fc2ca5 ("rtla/osnoise: Add osnoise top mode")
    Link: https://lore.kernel.org/f964ed1f-64d2-4fde-ad3e-708331f8f358@stanley.mountain
    Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

rust: fix the default format for CONFIG_{RUSTC,BINDGEN}_VERSION_TEXT [+ + +]

Author: Masahiro Yamada <masahiroy@kernel.org>
Date:   Sat Jul 27 23:03:00 2024 +0900

    rust: fix the default format for CONFIG_{RUSTC,BINDGEN}_VERSION_TEXT
    
    [ Upstream commit aacf93e87f0d808ef46e621aa56caea336b4433c ]
    
    Another oddity in these config entries is their default value can fall
    back to 'n', which is a value for bool or tristate symbols.
    
    The '|| echo n' is an incorrect workaround to avoid the syntax error.
    This is not a big deal, as the entry is hidden by 'depends on RUST' in
    situations where '$(RUSTC) --version' or '$(BINDGEN) --version' fails.
    Anyway, it looks odd.
    
    The default of a string type symbol should be a double-quoted string
    literal. Turn it into an empty string when the version command fails.
    
    Fixes: 2f7ab1267dc9 ("Kbuild: add Rust support")
    Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
    Link: https://lore.kernel.org/r/20240727140302.1806011-2-masahiroy@kernel.org
    [ Rebased on top of v6.11-rc1. - Miguel ]
    Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

rust: suppress error messages from CONFIG_{RUSTC,BINDGEN}_VERSION_TEXT [+ + +]

Author: Masahiro Yamada <masahiroy@kernel.org>
Date:   Sat Jul 27 23:02:59 2024 +0900

    rust: suppress error messages from CONFIG_{RUSTC,BINDGEN}_VERSION_TEXT
    
    [ Upstream commit 5ce86c6c861352c9346ebb5c96ed70cb67414aa3 ]
    
    While this is a somewhat unusual case, I encountered odd error messages
    when I ran Kconfig in a foreign architecture chroot.
    
      $ make allmodconfig
      sh: 1: rustc: not found
      sh: 1: bindgen: not found
      #
      # configuration written to .config
      #
    
    The successful execution of 'command -v rustc' does not necessarily mean
    that 'rustc --version' will succeed.
    
      $ sh -c 'command -v rustc'
      /home/masahiro/.cargo/bin/rustc
      $ sh -c 'rustc --version'
      sh: 1: rustc: not found
    
    Here, 'rustc' is built for x86, and I ran it in an arm64 system.
    
    The current code:
    
      command -v $(RUSTC) >/dev/null 2>&1 && $(RUSTC) --version || echo n
    
    can be turned into:
    
      command -v $(RUSTC) >/dev/null 2>&1 && $(RUSTC) --version 2>/dev/null || echo n
    
    However, I did not understand the necessity of 'command -v $(RUSTC)'.
    
    I simplified it to:
    
      $(RUSTC) --version 2>/dev/null || echo n
    
    Fixes: 2f7ab1267dc9 ("Kbuild: add Rust support")
    Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
    Link: https://lore.kernel.org/r/20240727140302.1806011-1-masahiroy@kernel.org
    [ Rebased on top of v6.11-rc1. - Miguel ]
    Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

rust: work around `bindgen` 0.69.0 issue [+ + +]

Author: Miguel Ojeda <ojeda@kernel.org>
Date:   Tue Jul 9 18:06:03 2024 +0200

    rust: work around `bindgen` 0.69.0 issue
    
    [ Upstream commit 9e98db17837093cb0f4dcfcc3524739d93249c45 ]
    
    `bindgen` 0.69.0 contains a bug: `--version` does not work without
    providing a header [1]:
    
        error: the following required arguments were not provided:
          <HEADER>
    
        Usage: bindgen <FLAGS> <OPTIONS> <HEADER> -- <CLANG_ARGS>...
    
    Thus, in preparation for supporting several `bindgen` versions, work
    around the issue by passing a dummy argument.
    
    Include a comment so that we can remove the workaround in the future.
    
    Link: https://github.com/rust-lang/rust-bindgen/pull/2678 [1]
    Reviewed-by: Finn Behrens <me@kloenk.dev>
    Tested-by: Benno Lossin <benno.lossin@proton.me>
    Tested-by: Andreas Hindborg <a.hindborg@samsung.com>
    Link: https://lore.kernel.org/r/20240709160615.998336-9-ojeda@kernel.org
    Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
    Stable-dep-of: 5ce86c6c8613 ("rust: suppress error messages from CONFIG_{RUSTC,BINDGEN}_VERSION_TEXT")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

s390/ap: Refine AP bus bindings complete processing [+ + +]

Author: Harald Freudenberger <freude@linux.ibm.com>
Date:   Tue Aug 6 12:06:23 2024 +0200

    s390/ap: Refine AP bus bindings complete processing
    
    commit b4f5bd60d558f6ba451d7e76aa05782c07a182a3 upstream.
    
    With the rework of the AP bus scan and the introduction of
    a bindings complete completion also the timing until the
    userspace finally receives a AP bus binding complete uevent
    had increased. Unfortunately this event triggers some important
    jobs for preparation of KVM guests, for example the modification
    of card/queue masks to reassign AP resources to the alternate
    AP queue device driver (vfio_ap) which is the precondition
    for building mediated devices which may be a precondition for
    starting KVM guests using AP resources.
    
    This small fix now triggers the check for binding complete
    each time an AP device driver has registered. With this patch
    the bindings complete may be posted up to 30s earlier as there
    is no need to wait for the next AP bus scan any more.
    
    Fixes: 778412ab915d ("s390/ap: rearm APQNs bindings complete completion")
    Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
    Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
    Cc: stable@vger.kernel.org
    Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
    Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

s390/boot: Avoid possible physmem_info segment corruption [+ + +]

Author: Alexander Gordeev <agordeev@linux.ibm.com>
Date:   Wed Aug 21 18:55:06 2024 +0200

    s390/boot: Avoid possible physmem_info segment corruption
    
    [ Upstream commit d7fd2941ae9a67423d1c7bee985f240e4686634f ]
    
    When physical memory for the kernel image is allocated it does not
    consider extra memory required for offsetting the image start to
    match it with the lower 20 bits of KASLR virtual base address. That
    might lead to kernel access beyond its memory range.
    
    Suggested-by: Vasily Gorbik <gor@linux.ibm.com>
    Fixes: 693d41f7c938 ("s390/mm: Restore mapping of kernel image using large pages")
    Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
    Acked-by: Vasily Gorbik <gor@linux.ibm.com>
    Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

s390/boot: Fix KASLR base offset off by __START_KERNEL bytes [+ + +]

Author: Alexander Gordeev <agordeev@linux.ibm.com>
Date:   Wed Aug 21 18:55:07 2024 +0200

    s390/boot: Fix KASLR base offset off by __START_KERNEL bytes
    
    [ Upstream commit 1642285e511c2a40b14e87a41aa8feace6123036 ]
    
    Symbol offsets to the KASLR base do not match symbol address in
    the vmlinux image. That is the result of setting the KASLR base
    to the beginning of .text section as result of an optimization.
    
    Revert that optimization and allocate virtual memory for the
    whole kernel image including __START_KERNEL bytes as per the
    linker script. That allows keeping the semantics of the KASLR
    base offset in sync with other architectures.
    
    Rename __START_KERNEL to TEXT_OFFSET, since it represents the
    offset of the .text section within the kernel image, rather than
    a virtual address.
    
    Still skip mapping TEXT_OFFSET bytes to save memory on pgtables
    and provoke exceptions in case an attempt to access this area is
    made, as no kernel symbol may reside there.
    
    In case CONFIG_KASAN is enabled the location counter might exceed
    the value of TEXT_OFFSET, while the decompressor linker script
    forcefully resets it to TEXT_OFFSET, which leads to a sections
    overlap link failure. Use MAX() expression to avoid that.
    
    Reported-by: Omar Sandoval <osandov@osandov.com>
    Closes: https://lore.kernel.org/linux-s390/ZnS8dycxhtXBZVky@telecaster.dhcp.thefacebook.com/
    Fixes: 56b1069c40c7 ("s390/boot: Rework deployment of the kernel image")
    Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
    Acked-by: Vasily Gorbik <gor@linux.ibm.com>
    Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

s390/dasd: fix error recovery leading to data corruption on ESE devices [+ + +]

Author: Stefan Haberland <sth@linux.ibm.com>
Date:   Mon Aug 12 14:57:33 2024 +0200

    s390/dasd: fix error recovery leading to data corruption on ESE devices
    
    commit 7db4042336580dfd75cb5faa82c12cd51098c90b upstream.
    
    Extent Space Efficient (ESE) or thin provisioned volumes need to be
    formatted on demand during usual IO processing.
    
    The dasd_ese_needs_format function checks for error codes that signal
    the non existence of a proper track format.
    
    The check for incorrect length is to imprecise since other error cases
    leading to transport of insufficient data also have this flag set.
    This might lead to data corruption in certain error cases for example
    during a storage server warmstart.
    
    Fix by removing the check for incorrect length and replacing by
    explicitly checking for invalid track format in transport mode.
    
    Also remove the check for file protected since this is not a valid
    ESE handling case.
    
    Cc: stable@vger.kernel.org # 5.3+
    Fixes: 5e2b17e712cf ("s390/dasd: Add dynamic formatting support for ESE volumes")
    Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com>
    Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
    Link: https://lore.kernel.org/r/20240812125733.126431-3-sth@linux.ibm.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

s390/dasd: Remove DMA alignment [+ + +]

Author: Eric Farman <farman@linux.ibm.com>
Date:   Mon Aug 12 14:57:32 2024 +0200

    s390/dasd: Remove DMA alignment
    
    [ Upstream commit 2a07bb64d80152701d507b1498237ed1b8d83866 ]
    
    This reverts commit bc792884b76f ("s390/dasd: Establish DMA alignment").
    
    Quoting the original commit:
        linux-next commit bf8d08532bc1 ("iomap: add support for dma aligned
        direct-io") changes the alignment requirement to come from the block
        device rather than the block size, and the default alignment
        requirement is 512-byte boundaries. Since DASD I/O has page
        alignments for IDAW/TIDAW requests, let's override this value to
        restore the expected behavior.
    
    I mentioned TIDAW, but that was wrong. TIDAWs have no distinct alignment
    requirement (per p. 15-70 of POPS SA22-7832-13):
    
       Unless otherwise specified, TIDAWs may designate
       a block of main storage on any boundary and length
       up to 4K bytes, provided the specified block does not
       cross a 4 K-byte boundary.
    
    IDAWs do, but the original commit neglected that while ECKD DASD are
    typically formatted in 4096-byte blocks, they don't HAVE to be. Formatting
    an ECKD volume with smaller blocks is permitted (dasdfmt -b xxx), and the
    problematic commit enforces alignment properties to such a device that
    will result in errors, such as:
    
       [test@host ~]# lsdasd -l a367 | grep blksz
         blksz:                             512
       [test@host ~]# mkfs.xfs -f /dev/disk/by-path/ccw-0.0.a367-part1
       meta-data=/dev/dasdc1            isize=512    agcount=4, agsize=230075 blks
                =                       sectsz=512   attr=2, projid32bit=1
                =                       crc=1        finobt=1, sparse=1, rmapbt=1
                =                       reflink=1    bigtime=1 inobtcount=1 nrext64=1
       data     =                       bsize=4096   blocks=920299, imaxpct=25
                =                       sunit=0      swidth=0 blks
       naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
       log      =internal log           bsize=4096   blocks=16384, version=2
                =                       sectsz=512   sunit=0 blks, lazy-count=1
       realtime =none                   extsz=4096   blocks=0, rtextents=0
       error reading existing superblock: Invalid argument
       mkfs.xfs: pwrite failed: Invalid argument
       libxfs_bwrite: write failed on (unknown) bno 0x70565c/0x100, err=22
       mkfs.xfs: Releasing dirty buffer to free list!
       found dirty buffer (bulk) on free list!
       mkfs.xfs: pwrite failed: Invalid argument
       ...snipped...
    
    The original commit omitted the FBA discipline for just this reason,
    but the formatted block size of the other disciplines was overlooked.
    The solution to all of this is to revert to the original behavior,
    such that the block size can be respected. There were two commits [1]
    that moved this code in the interim, so a straight git-revert is not
    possible, but the change is straightforward.
    
    But what of the original problem? That was manifested with a direct-io
    QEMU guest, where QEMU itself was changed a month or two later with
    commit 25474d90aa ("block: use the request length for iov alignment")
    such that the blamed kernel commit is unnecessary.
    
    [1] commit 0127a47f58c6 ("dasd: move queue setup to common code")
        commit fde07a4d74e3 ("dasd: use the atomic queue limits API")
    
    Fixes: bc792884b76f ("s390/dasd: Establish DMA alignment")
    Reviewed-by: Stefan Haberland <sth@linux.ibm.com>
    Signed-off-by: Eric Farman <farman@linux.ibm.com>
    Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
    Link: https://lore.kernel.org/r/20240812125733.126431-2-sth@linux.ibm.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

s390/iucv: Fix vargs handling in iucv_alloc_device() [+ + +]

Author: Alexandra Winter <wintera@linux.ibm.com>
Date:   Wed Aug 21 11:13:37 2024 +0200

    s390/iucv: Fix vargs handling in iucv_alloc_device()
    
    [ Upstream commit 0124fb0ebf3b0ef89892d42147c9387be3105318 ]
    
    iucv_alloc_device() gets a format string and a varying number of
    arguments. This is incorrectly forwarded by calling dev_set_name() with
    the format string and a va_list, while dev_set_name() expects also a
    varying number of arguments.
    
    Symptoms:
    Corrupted iucv device names, which can result in log messages like:
    sysfs: cannot create duplicate filename '/devices/iucv/hvc_iucv1827699952'
    
    Fixes: 4452e8ef8c36 ("s390/iucv: Provide iucv_alloc_device() / iucv_release_device()")
    Link: https://bugzilla.suse.com/show_bug.cgi?id=1228425
    Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
    Reviewed-by: Thorsten Winkler <twinkler@linux.ibm.com>
    Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
    Link: https://patch.msgid.link/20240821091337.3627068-1-wintera@linux.ibm.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

s390/uv: Panic for set and remove shared access UVC errors [+ + +]

Author: Claudio Imbrenda <imbrenda@linux.ibm.com>
Date:   Thu Aug 1 13:25:48 2024 +0200

    s390/uv: Panic for set and remove shared access UVC errors
    
    [ Upstream commit cff59d8631e1409ffdd22d9d717e15810181b32c ]
    
    The return value uv_set_shared() and uv_remove_shared() (which are
    wrappers around the share() function) is not always checked. The system
    integrity of a protected guest depends on the Share and Unshare UVCs
    being successful. This means that any caller that fails to check the
    return value will compromise the security of the protected guest.
    
    No code path that would lead to such violation of the security
    guarantees is currently exercised, since all the areas that are shared
    never get unshared during the lifetime of the system. This might
    change and become an issue in the future.
    
    The Share and Unshare UVCs can only fail in case of hypervisor
    misbehaviour (either a bug or malicious behaviour). In such cases there
    is no reasonable way forward, and the system needs to panic.
    
    This patch replaces the return at the end of the share() function with
    a panic, to guarantee system integrity.
    
    Fixes: 5abb9351dfd9 ("s390/uv: introduce guest side ultravisor code")
    Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
    Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
    Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
    Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
    Link: https://lore.kernel.org/r/20240801112548.85303-1-imbrenda@linux.ibm.com
    Message-ID: <20240801112548.85303-1-imbrenda@linux.ibm.com>
    [frankja@linux.ibm.com: Fixed up patch subject]
    Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

scsi: core: Fix the return value of scsi_logical_block_count() [+ + +]

Author: Chaotian Jing <chaotian.jing@mediatek.com>
Date:   Tue Aug 13 13:34:10 2024 +0800

    scsi: core: Fix the return value of scsi_logical_block_count()
    
    commit f03e94f23b04c2b71c0044c1534921b3975ef10c upstream.
    
    scsi_logical_block_count() should return the block count of a given SCSI
    command. The original implementation ended up shifting twice, leading to an
    incorrect count being returned. Fix the conversion between bytes and
    logical blocks.
    
    Cc: stable@vger.kernel.org
    Fixes: 6a20e21ae1e2 ("scsi: core: Add helper to return number of logical blocks in a request")
    Signed-off-by: Chaotian Jing <chaotian.jing@mediatek.com>
    Link: https://lore.kernel.org/r/20240813053534.7720-1-chaotian.jing@mediatek.com
    Reviewed-by: Bart Van Assche <bvanassche@acm.org>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

selftest: af_unix: Fix kselftest compilation warnings [+ + +]

Author: Abhinav Jain <jain.abhinav177@gmail.com>
Date:   Wed Aug 14 13:37:43 2024 +0530

    selftest: af_unix: Fix kselftest compilation warnings
    
    [ Upstream commit 6c569b77f0300f8a9960277c7094fa0f128eb811 ]
    
    Change expected_buf from (const void *) to (const char *)
    in function __recvpair().
    This change fixes the below warnings during test compilation:
    
    ```
    In file included from msg_oob.c:14:
    msg_oob.c: In function ‘__recvpair’:
    
    ../../kselftest_harness.h:106:40: warning: format ‘%s’ expects argument
    of type ‘char *’,but argument 6 has type ‘const void *’ [-Wformat=]
    
    ../../kselftest_harness.h:101:17: note: in expansion of macro ‘__TH_LOG’
    msg_oob.c:235:17: note: in expansion of macro ‘TH_LOG’
    
    ../../kselftest_harness.h:106:40: warning: format ‘%s’ expects argument
    of type ‘char *’,but argument 6 has type ‘const void *’ [-Wformat=]
    
    ../../kselftest_harness.h:101:17: note: in expansion of macro ‘__TH_LOG’
    msg_oob.c:259:25: note: in expansion of macro ‘TH_LOG’
    ```
    
    Fixes: d098d77232c3 ("selftest: af_unix: Add msg_oob.c.")
    Signed-off-by: Abhinav Jain <jain.abhinav177@gmail.com>
    Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Link: https://patch.msgid.link/20240814080743.1156166-1-jain.abhinav177@gmail.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

selftests/bpf: Add a test to verify previous stacksafe() fix [+ + +]

Author: Yonghong Song <yonghong.song@linux.dev>
Date:   Mon Aug 12 14:48:52 2024 -0700

    selftests/bpf: Add a test to verify previous stacksafe() fix
    
    commit 662c3e2db00f92e50c26e9dc4fe47c52223d9982 upstream.
    
    A selftest is added such that without the previous patch,
    a crash can happen. With the previous patch, the test can
    run successfully. The new test is written in a way which
    mimics original crash case:
      main_prog
        static_prog_1
          static_prog_2
    where static_prog_1 has different paths to static_prog_2
    and some path has stack allocated and some other path
    does not. A stacksafe() checking in static_prog_2()
    triggered the crash.
    
    Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
    Link: https://lore.kernel.org/r/20240812214852.214037-1-yonghong.song@linux.dev
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Signed-off-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

selftests: memfd_secret: don't build memfd_secret test on unsupported arches [+ + +]

Author: Muhammad Usama Anjum <usama.anjum@collabora.com>
Date:   Fri Aug 9 12:56:42 2024 +0500

    selftests: memfd_secret: don't build memfd_secret test on unsupported arches
    
    commit 7c5e8d212d7d81991a580e7de3904ea213d9a852 upstream.
    
    [1] mentions that memfd_secret is only supported on arm64, riscv, x86 and
    x86_64 for now.  It doesn't support other architectures.  I found the
    build error on arm and decided to send the fix as it was creating noise on
    KernelCI:
    
    memfd_secret.c: In function 'memfd_secret':
    memfd_secret.c:42:24: error: '__NR_memfd_secret' undeclared (first use in this function);
    did you mean 'memfd_secret'?
       42 |         return syscall(__NR_memfd_secret, flags);
          |                        ^~~~~~~~~~~~~~~~~
          |                        memfd_secret
    
    Hence I'm adding condition that memfd_secret should only be compiled on
    supported architectures.
    
    Also check in run_vmtests script if memfd_secret binary is present before
    executing it.
    
    Link: https://lkml.kernel.org/r/20240812061522.1933054-1-usama.anjum@collabora.com
    Link: https://lore.kernel.org/all/20210518072034.31572-7-rppt@kernel.org/ [1]
    Link: https://lkml.kernel.org/r/20240809075642.403247-1-usama.anjum@collabora.com
    Fixes: 76fe17ef588a ("secretmem: test: add basic selftest for memfd_secret(2)")
    Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
    Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
    Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
    Cc: Albert Ou <aou@eecs.berkeley.edu>
    Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
    Cc: Mike Rapoport (Microsoft) <rppt@kernel.org>
    Cc: Palmer Dabbelt <palmer@dabbelt.com>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

selftests: mlxsw: ethtool_lanes: Source ethtool lib from correct path [+ + +]

Author: Ido Schimmel <idosch@nvidia.com>
Date:   Tue Aug 20 12:53:47 2024 +0200

    selftests: mlxsw: ethtool_lanes: Source ethtool lib from correct path
    
    [ Upstream commit f8669d7b5f5d2d88959456ae9123d8bb6fdc1ebe ]
    
    Source the ethtool library from the correct path and avoid the following
    error:
    
    ./ethtool_lanes.sh: line 14: ./../../../net/forwarding/ethtool_lib.sh: No such file or directory
    
    Fixes: 40d269c000bd ("selftests: forwarding: Move several selftests")
    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Signed-off-by: Petr Machata <petrm@nvidia.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Link: https://patch.msgid.link/2112faff02e536e1ac14beb4c2be09c9574b90ae.1724150067.git.petrm@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

selftests: mptcp: join: check re-using ID of closed subflow [+ + +]

Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date:   Mon Aug 19 21:45:22 2024 +0200

    selftests: mptcp: join: check re-using ID of closed subflow
    
    commit 65fb58afa341ad68e71e5c4d816b407e6a683a66 upstream.
    
    This test extends "delete and re-add" to validate the previous commit. A
    new 'subflow' endpoint is added, but the subflow request will be
    rejected. The result is that no subflow will be established from this
    address.
    
    Later, the endpoint is removed and re-added after having cleared the
    firewall rule. Before the previous commit, the client would not have
    been able to create this new subflow.
    
    While at it, extra checks have been added to validate the expected
    numbers of MPJ and RM_ADDR.
    
    The 'Fixes' tag here below is the same as the one from the previous
    commit: this patch here is not fixing anything wrong in the selftests,
    but it validates the previous fix for an issue introduced by this commit
    ID.
    
    Fixes: b6c08380860b ("mptcp: remove addr and subflow in PM netlink")
    Cc: stable@vger.kernel.org
    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Link: https://patch.msgid.link/20240819-net-mptcp-pm-reusing-id-v1-4-38035d40de5b@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

selftests: mptcp: join: validate fullmesh endp on 1st sf [+ + +]

Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date:   Mon Aug 19 21:45:31 2024 +0200

    selftests: mptcp: join: validate fullmesh endp on 1st sf
    
    commit 4878f9f8421f4587bee7b232c1c8a9d3a7d4d782 upstream.
    
    This case was not covered, and the wrong ID was set before the previous
    commit.
    
    The rest is not modified, it is just that it will increase the code
    coverage.
    
    The right address ID can be verified by looking at the packet traces. We
    could automate that using Netfilter with some cBPF code for example, but
    that's always a bit cryptic. Packetdrill seems better fitted for that.
    
    Fixes: 4f49d63352da ("selftests: mptcp: add fullmesh testcases")
    Cc: stable@vger.kernel.org
    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Link: https://patch.msgid.link/20240819-net-mptcp-pm-reusing-id-v1-13-38035d40de5b@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

selftests: net: lib: ignore possible errors [+ + +]

Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date:   Fri Jun 7 18:31:02 2024 +0200

    selftests: net: lib: ignore possible errors
    
    [ Upstream commit 7e0620bc6a5ec6b340a0be40054f294ca26c010f ]
    
    No need to disable errexit temporary, simply ignore the only possible
    and not handled error.
    
    Reviewed-by: Geliang Tang <geliang@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Link: https://lore.kernel.org/r/20240607-upstream-net-next-20240607-selftests-mptcp-net-lib-v1-1-e36986faac94@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Stable-dep-of: 7965a7f32a53 ("selftests: net: lib: kill PIDs before del netns")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

selftests: net: lib: kill PIDs before del netns [+ + +]

Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date:   Tue Aug 13 15:39:34 2024 +0200

    selftests: net: lib: kill PIDs before del netns
    
    [ Upstream commit 7965a7f32a53d9ad807ce2c53bdda69ba104974f ]
    
    When deleting netns, it is possible to still have some tasks running,
    e.g. background tasks like tcpdump running in the background, not
    stopped because the test has been interrupted.
    
    Before deleting the netns, it is then safer to kill all attached PIDs,
    if any. That should reduce some noises after the end of some tests, and
    help with the debugging of some issues. That's why this modification is
    seen as a "fix".
    
    Fixes: 25ae948b4478 ("selftests/net: add lib.sh")
    Acked-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Acked-by: Florian Westphal <fw@strlen.de>
    Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
    Link: https://patch.msgid.link/20240813-upstream-net-20240813-selftests-net-lib-kill-v1-1-27b689b248b8@kernel.org
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

selftests: udpgro: no need to load xdp for gro [+ + +]

Author: Hangbin Liu <liuhangbin@gmail.com>
Date:   Thu Aug 15 15:59:51 2024 +0800

    selftests: udpgro: no need to load xdp for gro
    
    [ Upstream commit d7818402b1d80347c764001583f6d63fa68c2e1a ]
    
    After commit d7db7775ea2e ("net: veth: do not manipulate GRO when using
    XDP"), there is no need to load XDP program to enable GRO. On the other
    hand, the current test is failed due to loading the XDP program. e.g.
    
     # selftests: net: udpgro.sh
     # ipv4
     #  no GRO              ok
     #  no GRO chk cmsg     ok
     #  GRO                 ./udpgso_bench_rx: recv: bad packet len, got 1472, expected 14720
     #
     # failed
    
     [...]
    
     #  bad GRO lookup      ok
     #  multiple GRO socks  ./udpgso_bench_rx: recv: bad packet len, got 1452, expected 14520
     #
     # ./udpgso_bench_rx: recv: bad packet len, got 1452, expected 14520
     #
     # failed
     ok 1 selftests: net: udpgro.sh
    
    After fix, all the test passed.
    
     # ./udpgro.sh
     ipv4
      no GRO                                  ok
      [...]
      multiple GRO socks                      ok
    
    Fixes: d7db7775ea2e ("net: veth: do not manipulate GRO when using XDP")
    Reported-by: Yi Chen <yiche@redhat.com>
    Closes: https://issues.redhat.com/browse/RHEL-53858
    Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

selftests: udpgro: report error when receive failed [+ + +]

Author: Hangbin Liu <liuhangbin@gmail.com>
Date:   Thu Aug 15 15:59:50 2024 +0800

    selftests: udpgro: report error when receive failed
    
    [ Upstream commit 7167395a4be7930ecac6a33b4e54d7e3dd9ee209 ]
    
    Currently, we only check the latest senders's exit code. If the receiver
    report failed, it is not recoreded. Fix it by checking the exit code
    of all the involved processes.
    
    Before:
      bad GRO lookup       ok
      multiple GRO socks   ./udpgso_bench_rx: recv: bad packet len, got 1452, expected 14520
    
     ./udpgso_bench_rx: recv: bad packet len, got 1452, expected 14520
    
     failed
     $ echo $?
     0
    
    After:
      bad GRO lookup       ok
      multiple GRO socks   ./udpgso_bench_rx: recv: bad packet len, got 1452, expected 14520
    
     ./udpgso_bench_rx: recv: bad packet len, got 1452, expected 14520
    
     failed
     $ echo $?
     1
    
    Fixes: 3327a9c46352 ("selftests: add functionals test for UDP GRO")
    Suggested-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

selinux: add the processing of the failure of avc_add_xperms_decision() [+ + +]

Author: Zhen Lei <thunder.leizhen@huawei.com>
Date:   Wed Aug 7 17:00:56 2024 +0800

    selinux: add the processing of the failure of avc_add_xperms_decision()
    
    commit 6dd1e4c045afa6a4ba5d46f044c83bd357c593c2 upstream.
    
    When avc_add_xperms_decision() fails, the information recorded by the new
    avc node is incomplete. In this case, the new avc node should be released
    instead of replacing the old avc node.
    
    Cc: stable@vger.kernel.org
    Fixes: fa1aa143ac4a ("selinux: extended permissions for ioctls")
    Suggested-by: Stephen Smalley <stephen.smalley.work@gmail.com>
    Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
    Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
    Signed-off-by: Paul Moore <paul@paul-moore.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

selinux: fix potential counting error in avc_add_xperms_decision() [+ + +]

Author: Zhen Lei <thunder.leizhen@huawei.com>
Date:   Tue Aug 6 14:51:13 2024 +0800

    selinux: fix potential counting error in avc_add_xperms_decision()
    
    commit 379d9af3f3da2da1bbfa67baf1820c72a080d1f1 upstream.
    
    The count increases only when a node is successfully added to
    the linked list.
    
    Cc: stable@vger.kernel.org
    Fixes: fa1aa143ac4a ("selinux: extended permissions for ioctls")
    Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
    Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
    Signed-off-by: Paul Moore <paul@paul-moore.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

selinux: revert our use of vma_is_initial_heap() [+ + +]

Author: Paul Moore <paul@paul-moore.com>
Date:   Thu Aug 8 11:57:38 2024 -0400

    selinux: revert our use of vma_is_initial_heap()
    
    commit 05a3d6e9307250a5911d75308e4363466794ab21 upstream.
    
    Unfortunately it appears that vma_is_initial_heap() is currently broken
    for applications that do not currently have any heap allocated, e.g.
    brk == start_brk.  The breakage is such that it will cause SELinux to
    check for the process/execheap permission on memory regions that cross
    brk/start_brk even when there is no heap.
    
    The proper fix would be to correct vma_is_initial_heap(), but as there
    are multiple callers I am hesitant to unilaterally modify the helper
    out of concern that I would end up breaking some other subsystem.  The
    mm developers have been made aware of the situation and hopefully they
    will have a fix at some point in the future, but we need a fix soon so
    we are simply going to revert our use of vma_is_initial_heap() in favor
    of our old logic/code which works as expected, even in the face of a
    zero size heap.  We can return to using vma_is_initial_heap() at some
    point in the future when it is fixed.
    
    Cc: stable@vger.kernel.org
    Reported-by: Marc Reisner <reisner.marc@gmail.com>
    Closes: https://lore.kernel.org/all/ZrPmoLKJEf1wiFmM@marcreisner.com
    Fixes: 68df1baf158f ("selinux: use vma_is_initial_stack() and vma_is_initial_heap()")
    Signed-off-by: Paul Moore <paul@paul-moore.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

smb/client: avoid possible NULL dereference in cifs_free_subrequest() [+ + +]

Author: Su Hui <suhui@nfschina.com>
Date:   Thu Aug 8 20:23:32 2024 +0800

    smb/client: avoid possible NULL dereference in cifs_free_subrequest()
    
    [ Upstream commit 74c2ab6d653b4c2354df65a7f7f2df1925a40a51 ]
    
    Clang static checker (scan-build) warning:
            cifsglob.h:line 890, column 3
            Access to field 'ops' results in a dereference of a null pointer.
    
    Commit 519be989717c ("cifs: Add a tracepoint to track credits involved in
    R/W requests") adds a check for 'rdata->server', and let clang throw this
    warning about NULL dereference.
    
    When 'rdata->credits.value != 0 && rdata->server == NULL' happens,
    add_credits_and_wake_if() will call rdata->server->ops->add_credits().
    This will cause NULL dereference problem. Add a check for 'rdata->server'
    to avoid NULL dereference.
    
    Cc: stable@vger.kernel.org
    Fixes: 69c3c023af25 ("cifs: Implement netfslib hooks")
    Reviewed-by: David Howells <dhowells@redhat.com>
    Signed-off-by: Su Hui <suhui@nfschina.com>
    Signed-off-by: Steve French <stfrench@microsoft.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

smb3: fix broken cached reads when posix locks [+ + +]

Author: Steve French <stfrench@microsoft.com>
Date:   Thu Aug 15 18:31:36 2024 -0500

    smb3: fix broken cached reads when posix locks
    
    commit e4be320eeca842a3d7648258ee3673f1755a5a59 upstream.
    
    Mandatory locking is enforced for cached reads, which violates
    default posix semantics, and also it is enforced inconsistently.
    This affected recent versions of libreoffice, and can be
    demonstrated by opening a file twice from the same client,
    locking it from handle one and trying to read from it from
    handle two (which fails, returning EACCES).
    
    There is already a mount option "forcemandatorylock"
    (which defaults to off), so with this change only when the user
    intentionally specifies "forcemandatorylock" on mount will we
    break posix semantics on read to a locked range (ie we will
    only fail in this case, if the user mounts with
    "forcemandatorylock").
    
    An earlier patch fixed the write path.
    
    Fixes: 85160e03a79e ("CIFS: Implement caching mechanism for mandatory brlocks")
    Cc: stable@vger.kernel.org
    Cc: Pavel Shilovsky <piastryyy@gmail.com>
    Reviewed-by: David Howells <dhowells@redhat.com>
    Reported-by: abartlet@samba.org
    Reported-by: Kevin Ottens <kevin.ottens@enioka.com>
    Signed-off-by: Steve French <stfrench@microsoft.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

smb3: fix lock breakage for cached writes [+ + +]

Author: Steve French <stfrench@microsoft.com>
Date:   Thu Aug 15 14:03:43 2024 -0500

    smb3: fix lock breakage for cached writes
    
    commit 836bb3268db405cf9021496ac4dbc26d3e4758fe upstream.
    
    Mandatory locking is enforced for cached writes, which violates
    default posix semantics, and also it is enforced inconsistently.
    This apparently breaks recent versions of libreoffice, but can
    also be demonstrated by opening a file twice from the same
    client, locking it from handle one and writing to it from
    handle two (which fails, returning EACCES).
    
    Since there was already a mount option "forcemandatorylock"
    (which defaults to off), with this change only when the user
    intentionally specifies "forcemandatorylock" on mount will we
    break posix semantics on write to a locked range (ie we will
    only fail the write in this case, if the user mounts with
    "forcemandatorylock").
    
    Fixes: 85160e03a79e ("CIFS: Implement caching mechanism for mandatory brlocks")
    Cc: stable@vger.kernel.org
    Cc: Pavel Shilovsky <piastryyy@gmail.com>
    Reported-by: abartlet@samba.org
    Reported-by: Kevin Ottens <kevin.ottens@enioka.com>
    Reviewed-by: David Howells <dhowells@redhat.com>
    Signed-off-by: Steve French <stfrench@microsoft.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

smb: client: ignore unhandled reparse tags [+ + +]

Author: Paulo Alcantara <pc@manguebit.com>
Date:   Wed Aug 21 00:45:03 2024 -0300

    smb: client: ignore unhandled reparse tags
    
    [ Upstream commit ec686804117a0421cf31d54427768aaf93aa0069 ]
    
    Just ignore reparse points that the client can't parse rather than
    bailing out and not opening the file or directory.
    
    Reported-by: Marc <1marc1@gmail.com>
    Closes: https://lore.kernel.org/r/CAMHwNVv-B+Q6wa0FEXrAuzdchzcJRsPKDDRrNaYZJd6X-+iJzw@mail.gmail.com
    Fixes: 539aad7f14da ("smb: client: introduce ->parse_reparse_point()")
    Tested-by: Anthony Nandaa (Microsoft) <profnandaa@gmail.com>
    Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
    Signed-off-by: Steve French <stfrench@microsoft.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

spi: spi-cadence-quadspi: Fix OSPI NOR failures during system resume [+ + +]

Author: Vignesh Raghavendra <vigneshr@ti.com>
Date:   Wed Aug 14 20:42:37 2024 +0530

    spi: spi-cadence-quadspi: Fix OSPI NOR failures during system resume
    
    [ Upstream commit 57d5af2660e9443b081eeaf1c373b3ce48477828 ]
    
    Its necessary to call pm_runtime_force_*() hooks as part of system
    suspend/resume calls so that the runtime_pm hooks get called. This
    ensures latest state of the IP is cached and restored during system
    sleep. This is especially true if runtime autosuspend is enabled as
    runtime suspend hooks may not be called at all before system sleeps.
    
    Without this patch, OSPI NOR enumeration (READ_ID) fails during resume
    as context saved during suspend path is inconsistent.
    
    Fixes: 078d62de433b ("spi: cadence-qspi: add system-wide suspend and resume callbacks")
    Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
    Link: https://patch.msgid.link/20240814151237.3856184-1-vigneshr@ti.com
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

tc-testing: don't access non-existent variable on exception [+ + +]

Author: Simon Horman <horms@kernel.org>
Date:   Thu Aug 15 16:37:13 2024 +0100

    tc-testing: don't access non-existent variable on exception
    
    [ Upstream commit a0c9fe5eecc97680323ee83780ea3eaf440ba1b7 ]
    
    Since commit 255c1c7279ab ("tc-testing: Allow test cases to be skipped")
    the variable test_ordinal doesn't exist in call_pre_case().
    So it should not be accessed when an exception occurs.
    
    This resolves the following splat:
    
      ...
      During handling of the above exception, another exception occurred:
    
      Traceback (most recent call last):
        File ".../tdc.py", line 1028, in <module>
          main()
        File ".../tdc.py", line 1022, in main
          set_operation_mode(pm, parser, args, remaining)
        File ".../tdc.py", line 966, in set_operation_mode
          catresults = test_runner_serial(pm, args, alltests)
        File ".../tdc.py", line 642, in test_runner_serial
          (index, tsr) = test_runner(pm, args, alltests)
        File ".../tdc.py", line 536, in test_runner
          res = run_one_test(pm, args, index, tidx)
        File ".../tdc.py", line 419, in run_one_test
          pm.call_pre_case(tidx)
        File ".../tdc.py", line 146, in call_pre_case
          print('test_ordinal is {}'.format(test_ordinal))
      NameError: name 'test_ordinal' is not defined
    
    Fixes: 255c1c7279ab ("tc-testing: Allow test cases to be skipped")
    Signed-off-by: Simon Horman <horms@kernel.org>
    Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
    Link: https://patch.msgid.link/20240815-tdc-test-ordinal-v1-1-0255c122a427@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

tcp: prevent concurrent execution of tcp_sk_exit_batch [+ + +]

Author: Florian Westphal <fw@strlen.de>
Date:   Tue Aug 13 00:28:25 2024 +0200

    tcp: prevent concurrent execution of tcp_sk_exit_batch
    
    [ Upstream commit 565d121b69980637f040eb4d84289869cdaabedf ]
    
    Its possible that two threads call tcp_sk_exit_batch() concurrently,
    once from the cleanup_net workqueue, once from a task that failed to clone
    a new netns.  In the latter case, error unwinding calls the exit handlers
    in reverse order for the 'failed' netns.
    
    tcp_sk_exit_batch() calls tcp_twsk_purge().
    Problem is that since commit b099ce2602d8 ("net: Batch inet_twsk_purge"),
    this function picks up twsk in any dying netns, not just the one passed
    in via exit_batch list.
    
    This means that the error unwind of setup_net() can "steal" and destroy
    timewait sockets belonging to the exiting netns.
    
    This allows the netns exit worker to proceed to call
    
    WARN_ON_ONCE(!refcount_dec_and_test(&net->ipv4.tcp_death_row.tw_refcount));
    
    without the expected 1 -> 0 transition, which then splats.
    
    At same time, error unwind path that is also running inet_twsk_purge()
    will splat as well:
    
    WARNING: .. at lib/refcount.c:31 refcount_warn_saturate+0x1ed/0x210
    ...
     refcount_dec include/linux/refcount.h:351 [inline]
     inet_twsk_kill+0x758/0x9c0 net/ipv4/inet_timewait_sock.c:70
     inet_twsk_deschedule_put net/ipv4/inet_timewait_sock.c:221
     inet_twsk_purge+0x725/0x890 net/ipv4/inet_timewait_sock.c:304
     tcp_sk_exit_batch+0x1c/0x170 net/ipv4/tcp_ipv4.c:3522
     ops_exit_list+0x128/0x180 net/core/net_namespace.c:178
     setup_net+0x714/0xb40 net/core/net_namespace.c:375
     copy_net_ns+0x2f0/0x670 net/core/net_namespace.c:508
     create_new_namespaces+0x3ea/0xb10 kernel/nsproxy.c:110
    
    ... because refcount_dec() of tw_refcount unexpectedly dropped to 0.
    
    This doesn't seem like an actual bug (no tw sockets got lost and I don't
    see a use-after-free) but as erroneous trigger of debug check.
    
    Add a mutex to force strict ordering: the task that calls tcp_twsk_purge()
    blocks other task from doing final _dec_and_test before mutex-owner has
    removed all tw sockets of dying netns.
    
    Fixes: e9bd0cca09d1 ("tcp: Don't allocate tcp_death_row outside of struct netns_ipv4.")
    Reported-by: syzbot+8ea26396ff85d23a8929@syzkaller.appspotmail.com
    Closes: https://lore.kernel.org/netdev/0000000000003a5292061f5e4e19@google.com/
    Link: https://lore.kernel.org/netdev/20240812140104.GA21559@breakpoint.cc/
    Signed-off-by: Florian Westphal <fw@strlen.de>
    Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Link: https://patch.msgid.link/20240812222857.29837-1-fw@strlen.de
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

tcp: Update window clamping condition [+ + +]

Author: Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com>
Date:   Thu Aug 8 16:06:40 2024 -0700

    tcp: Update window clamping condition
    
    [ Upstream commit a2cbb1603943281a604f5adc48079a148db5cb0d ]
    
    This patch is based on the discussions between Neal Cardwell and
    Eric Dumazet in the link
    https://lore.kernel.org/netdev/20240726204105.1466841-1-quic_subashab@quicinc.com/
    
    It was correctly pointed out that tp->window_clamp would not be
    updated in cases where net.ipv4.tcp_moderate_rcvbuf=0 or if
    (copied <= tp->rcvq_space.space). While it is expected for most
    setups to leave the sysctl enabled, the latter condition may
    not end up hitting depending on the TCP receive queue size and
    the pattern of arriving data.
    
    The updated check should be hit only on initial MSS update from
    TCP_MIN_MSS to measured MSS value and subsequently if there was
    an update to a larger value.
    
    Fixes: 05f76b2d634e ("tcp: Adjust clamping window for applications specifying SO_RCVBUF")
    Signed-off-by: Sean Tranchetti <quic_stranche@quicinc.com>
    Signed-off-by: Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com>
    Acked-by: Neal Cardwell <ncardwell@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

thermal/debugfs: Fix the NULL vs IS_ERR() confusion in debugfs_create_dir() [+ + +]

Author: Yang Ruibin <11162571@vivo.com>
Date:   Wed Aug 21 03:59:33 2024 -0400

    thermal/debugfs: Fix the NULL vs IS_ERR() confusion in debugfs_create_dir()
    
    [ Upstream commit 57df60e1f981fa8c288a49012a4bbb02ae0ecdbc ]
    
    The debugfs_create_dir() return value is never NULL, it is either a
    valid pointer or an error one.
    
    Use IS_ERR() to check it.
    
    Fixes: 7ef01f228c9f ("thermal/debugfs: Add thermal debugfs information for mitigation episodes")
    Fixes: 755113d76786 ("thermal/debugfs: Add thermal cooling device debugfs information")
    Signed-off-by: Yang Ruibin <11162571@vivo.com>
    Link: https://patch.msgid.link/20240821075934.12145-1-11162571@vivo.com
    [ rjw: Subject and changelog edits ]
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

thermal: gov_bang_bang: Add .manage() callback [+ + +]

Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Date:   Tue Aug 13 16:27:33 2024 +0200

    thermal: gov_bang_bang: Add .manage() callback
    
    [ Upstream commit 5f64b4a1ab1b0412446d42e1fc2964c2cdb60b27 ]
    
    After recent changes, the Bang-bang governor may not adjust the
    initial configuration of cooling devices to the actual situation.
    
    Namely, if a cooling device bound to a certain trip point starts in
    the "on" state and the thermal zone temperature is below the threshold
    of that trip point, the trip point may never be crossed on the way up
    in which case the state of the cooling device will never be adjusted
    because the thermal core will never invoke the governor's
    .trip_crossed() callback.  [Note that there is no issue if the zone
    temperature is at the trip threshold or above it to start with because
    .trip_crossed() will be invoked then to indicate the start of thermal
    mitigation for the given trip.]
    
    To address this, add a .manage() callback to the Bang-bang governor
    and use it to ensure that all of the thermal instances managed by the
    governor have been initialized properly and the states of all of the
    cooling devices involved have been adjusted to the current zone
    temperature as appropriate.
    
    Fixes: 530c932bdf75 ("thermal: gov_bang_bang: Use .trip_crossed() instead of .throttle()")
    Link: https://lore.kernel.org/linux-pm/1bfbbae5-42b0-4c7d-9544-e98855715294@piie.net/
    Cc: 6.10+ <stable@vger.kernel.org> # 6.10+
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Acked-by: Peter Kästle <peter@piie.net>
    Reviewed-by: Zhang Rui <rui.zhang@intel.com>
    Link: https://patch.msgid.link/8419356.T7Z3S40VBb@rjwysocki.net
    Signed-off-by: Sasha Levin <sashal@kernel.org>

thermal: gov_bang_bang: Call __thermal_cdev_update() directly [+ + +]

Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Date:   Tue Aug 13 16:25:19 2024 +0200

    thermal: gov_bang_bang: Call __thermal_cdev_update() directly
    
    commit b9b6ee6fe258ce4d89592593efcd3d798c418859 upstream.
    
    Instead of clearing the "updated" flag for each cooling device
    affected by the trip point crossing in bang_bang_control() and
    walking all thermal instances to run thermal_cdev_update() for all
    of the affected cooling devices, call __thermal_cdev_update()
    directly for each of them.
    
    No intentional functional impact.
    
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Acked-by: Peter Kästle <peter@piie.net>
    Reviewed-by: Zhang Rui <rui.zhang@intel.com>
    Cc: 6.10+ <stable@vger.kernel.org> # 6.10+
    Link: https://patch.msgid.link/13583081.uLZWGnKmhe@rjwysocki.net
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

thermal: gov_bang_bang: Drop unnecessary cooling device target state checks [+ + +]

Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Date:   Tue May 28 18:54:01 2024 +0200

    thermal: gov_bang_bang: Drop unnecessary cooling device target state checks
    
    [ Upstream commit 2c637af8a74d9a2a52ee5456a75dd29c8cb52da5 ]
    
    Some cooling device target state checks in bang_bang_control()
    done before setting the new target state are not necessary after
    recent changes, so drop them.
    
    Also avoid updating the target state before checking it for
    unexpected values.
    
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Stable-dep-of: 84248e35d9b6 ("thermal: gov_bang_bang: Split bang_bang_control()")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

thermal: gov_bang_bang: Split bang_bang_control() [+ + +]

Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Date:   Tue Aug 13 16:26:42 2024 +0200

    thermal: gov_bang_bang: Split bang_bang_control()
    
    [ Upstream commit 84248e35d9b60e03df7276627e4e91fbaf80f73d ]
    
    Move the setting of the thermal instance target state from
    bang_bang_control() into a separate function that will be also called
    in a different place going forward.
    
    No intentional functional impact.
    
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Acked-by: Peter Kästle <peter@piie.net>
    Reviewed-by: Zhang Rui <rui.zhang@intel.com>
    Cc: 6.10+ <stable@vger.kernel.org> # 6.10+
    Link: https://patch.msgid.link/3313587.aeNJFYEL58@rjwysocki.net
    Signed-off-by: Sasha Levin <sashal@kernel.org>

thermal: gov_bang_bang: Use governor_data to reduce overhead [+ + +]

Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Date:   Tue Aug 13 16:29:11 2024 +0200

    thermal: gov_bang_bang: Use governor_data to reduce overhead
    
    [ Upstream commit 6e6f58a170ea98e44075b761f2da42a5aec47dfb ]
    
    After running once, the for_each_trip_desc() loop in
    bang_bang_manage() is pure needless overhead because it is not going to
    make any changes unless a new cooling device has been bound to one of
    the trips in the thermal zone or the system is resuming from sleep.
    
    For this reason, make bang_bang_manage() set governor_data for the
    thermal zone and check it upfront to decide whether or not it needs to
    do anything.
    
    However, governor_data needs to be reset in some cases to let
    bang_bang_manage() know that it should walk the trips again, so add an
    .update_tz() callback to the governor and make the core additionally
    invoke it during system resume.
    
    To avoid affecting the other users of that callback unnecessarily, add
    a special notification reason for system resume, THERMAL_TZ_RESUME, and
    also pass it to __thermal_zone_device_update() called during system
    resume for consistency.
    
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Acked-by: Peter Kästle <peter@piie.net>
    Reviewed-by: Zhang Rui <rui.zhang@intel.com>
    Cc: 6.10+ <stable@vger.kernel.org> # 6.10+
    Link: https://patch.msgid.link/2285575.iZASKD2KPV@rjwysocki.net
    Signed-off-by: Sasha Levin <sashal@kernel.org>

thermal: of: Fix OF node leak in of_thermal_zone_find() error paths [+ + +]

Author: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Date:   Wed Aug 14 21:58:23 2024 +0200

    thermal: of: Fix OF node leak in of_thermal_zone_find() error paths
    
    commit c0a1ef9c5be72ff28a5413deb1b3e1a066593c13 upstream.
    
    Terminating for_each_available_child_of_node() loop requires dropping OF
    node reference, so bailing out on errors misses this.  Solve the OF node
    reference leak with scoped for_each_available_child_of_node_scoped().
    
    Fixes: 3fd6d6e2b4e8 ("thermal/of: Rework the thermal device tree initialization")
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
    Reviewed-by: Chen-Yu Tsai <wenst@chromium.org>
    Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
    Link: https://patch.msgid.link/20240814195823.437597-3-krzysztof.kozlowski@linaro.org
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

thermal: of: Fix OF node leak in thermal_of_trips_init() error path [+ + +]

Author: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Date:   Wed Aug 14 21:58:21 2024 +0200

    thermal: of: Fix OF node leak in thermal_of_trips_init() error path
    
    commit afc954fd223ded70b1fa000767e2531db55cce58 upstream.
    
    Terminating for_each_child_of_node() loop requires dropping OF node
    reference, so bailing out after thermal_of_populate_trip() error misses
    this.  Solve the OF node reference leak with scoped
    for_each_child_of_node_scoped().
    
    Fixes: d0c75fa2c17f ("thermal/of: Initialize trip points separately")
    Cc: All applicable <stable@vger.kernel.org>
    Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
    Reviewed-by: Chen-Yu Tsai <wenst@chromium.org>
    Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
    Link: https://patch.msgid.link/20240814195823.437597-1-krzysztof.kozlowski@linaro.org
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

thermal: of: Fix OF node leak in thermal_of_zone_register() [+ + +]

Author: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Date:   Wed Aug 14 21:58:22 2024 +0200

    thermal: of: Fix OF node leak in thermal_of_zone_register()
    
    commit 662b52b761bfe0ba970e5823759798faf809b896 upstream.
    
    thermal_of_zone_register() calls of_thermal_zone_find() which will
    iterate over OF nodes with for_each_available_child_of_node() to find
    matching thermal zone node.  When it finds such, it exits the loop and
    returns the node.  Prematurely ending for_each_available_child_of_node()
    loops requires dropping OF node reference, thus success of
    of_thermal_zone_find() means that caller must drop the reference.
    
    Fixes: 3fd6d6e2b4e8 ("thermal/of: Rework the thermal device tree initialization")
    Cc: All applicable <stable@vger.kernel.org>
    Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
    Reviewed-by: Chen-Yu Tsai <wenst@chromium.org>
    Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
    Link: https://patch.msgid.link/20240814195823.437597-2-krzysztof.kozlowski@linaro.org
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

thunderbolt: Mark XDomain as unplugged when router is removed [+ + +]

Author: Mika Westerberg <mika.westerberg@linux.intel.com>
Date:   Thu Jun 13 15:05:03 2024 +0300

    thunderbolt: Mark XDomain as unplugged when router is removed
    
    commit e2006140ad2e01a02ed0aff49cc2ae3ceeb11f8d upstream.
    
    I noticed that when we do discrete host router NVM upgrade and it gets
    hot-removed from the PCIe side as a result of NVM firmware authentication,
    if there is another host connected with enabled paths we hang in tearing
    them down. This is due to fact that the Thunderbolt networking driver
    also tries to cleanup the paths and ends up blocking in
    tb_disconnect_xdomain_paths() waiting for the domain lock.
    
    However, at this point we already cleaned the paths in tb_stop() so
    there is really no need for tb_disconnect_xdomain_paths() to do that
    anymore. Furthermore it already checks if the XDomain is unplugged and
    bails out early so take advantage of that and mark the XDomain as
    unplugged when we remove the parent router.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tracing: Return from tracing_buffers_read() if the file has been closed [+ + +]

Author: Steven Rostedt <rostedt@goodmis.org>
Date:   Thu Aug 8 23:57:30 2024 -0400

    tracing: Return from tracing_buffers_read() if the file has been closed
    
    commit d0949cd44a62c4c41b30ea7ae94d8c887f586882 upstream.
    
    When running the following:
    
     # cd /sys/kernel/tracing/
     # echo 1 > events/sched/sched_waking/enable
     # echo 1 > events/sched/sched_switch/enable
     # echo 0 > tracing_on
     # dd if=per_cpu/cpu0/trace_pipe_raw of=/tmp/raw0.dat
    
    The dd task would get stuck in an infinite loop in the kernel. What would
    happen is the following:
    
    When ring_buffer_read_page() returns -1 (no data) then a check is made to
    see if the buffer is empty (as happens when the page is not full), it will
    call wait_on_pipe() to wait until the ring buffer has data. When it is it
    will try again to read data (unless O_NONBLOCK is set).
    
    The issue happens when there's a reader and the file descriptor is closed.
    The wait_on_pipe() will return when that is the case. But this loop will
    continue to try again and wait_on_pipe() will again return immediately and
    the loop will continue and never stop.
    
    Simply check if the file was closed before looping and exit out if it is.
    
    Cc: stable@vger.kernel.org
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Link: https://lore.kernel.org/20240808235730.78bf63e5@rorschach.local.home
    Fixes: 2aa043a55b9a7 ("tracing/ring-buffer: Fix wait_on_pipe() race")
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tty: atmel_serial: use the correct RTS flag. [+ + +]

Author: Mathieu Othacehe <m.othacehe@gmail.com>
Date:   Thu Aug 8 08:06:37 2024 +0200

    tty: atmel_serial: use the correct RTS flag.
    
    commit c9f6613b16123989f2c3bd04b1d9b2365d6914e7 upstream.
    
    In RS485 mode, the RTS pin is driven high by hardware when the transmitter
    is operating. This behaviour cannot be changed. This means that the driver
    should claim that it supports SER_RS485_RTS_ON_SEND and not
    SER_RS485_RTS_AFTER_SEND.
    
    Otherwise, when configuring the port with the SER_RS485_RTS_ON_SEND, one
    get the following warning:
    
    kern.warning kernel: atmel_usart_serial atmel_usart_serial.2.auto:
    ttyS1 (1): invalid RTS setting, using RTS_AFTER_SEND instead
    
    which is contradictory with what's really happening.
    
    Signed-off-by: Mathieu Othacehe <othacehe@gnu.org>
    Cc: stable <stable@kernel.org>
    Tested-by: Alexander Dahl <ada@thorsis.com>
    Fixes: af47c491e3c7 ("serial: atmel: Fill in rs485_supported")
    Link: https://lore.kernel.org/r/20240808060637.19886-1-othacehe@gnu.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tty: serial: fsl_lpuart: mark last busy before uart_add_one_port [+ + +]

Author: Peng Fan <peng.fan@nxp.com>
Date:   Thu Aug 8 22:03:25 2024 +0800

    tty: serial: fsl_lpuart: mark last busy before uart_add_one_port
    
    commit dc98d76a15bc29a9a4e76f2f65f39f3e590fb15c upstream.
    
    With "earlycon initcall_debug=1 loglevel=8" in bootargs, kernel
    sometimes boot hang. It is because normal console still is not ready,
    but runtime suspend is called, so early console putchar will hang
    in waiting TRDE set in UARTSTAT.
    
    The lpuart driver has auto suspend delay set to 3000ms, but during
    uart_add_one_port, a child device serial ctrl will added and probed with
    its pm runtime enabled(see serial_ctrl.c).
    The runtime suspend call path is:
    device_add
         |-> bus_probe_device
               |->device_initial_probe
                       |->__device_attach
                             |-> pm_runtime_get_sync(dev->parent);
                             |-> pm_request_idle(dev);
                             |-> pm_runtime_put(dev->parent);
    
    So in the end, before normal console ready, the lpuart get runtime
    suspended. And earlycon putchar will hang.
    
    To address the issue, mark last busy just after pm_runtime_enable,
    three seconds is long enough to switch from bootconsole to normal
    console.
    
    Fixes: 43543e6f539b ("tty: serial: fsl_lpuart: Add runtime pm support")
    Cc: stable <stable@kernel.org>
    Signed-off-by: Peng Fan <peng.fan@nxp.com>
    Link: https://lore.kernel.org/r/20240808140325.580105-1-peng.fan@oss.nxp.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tty: vt: conmakehash: remove non-portable code printing comment header [+ + +]

Author: Masahiro Yamada <masahiroy@kernel.org>
Date:   Sat Aug 10 01:07:20 2024 +0900

    tty: vt: conmakehash: remove non-portable code printing comment header
    
    commit 7258fdd7d7459616b3fe1a603e33900584b10c13 upstream.
    
    Commit 6e20753da6bc ("tty: vt: conmakehash: cope with abs_srctree no
    longer in env") included <linux/limits.h>, which invoked another
    (wrong) patch that tried to address a build error on macOS.
    
    According to the specification [1], the correct header to use PATH_MAX
    is <limits.h>.
    
    The minimal fix would be to replace <linux/limits.h> with <limits.h>.
    
    However, the following commits seem questionable to me:
    
     - 3bd85c6c97b2 ("tty: vt: conmakehash: Don't mention the full path of the input in output")
     - 6e20753da6bc ("tty: vt: conmakehash: cope with abs_srctree no longer in env")
    
    These commits made too many efforts to cope with a comment header in
    drivers/tty/vt/consolemap_deftbl.c:
    
      /*
       * Do not edit this file; it was automatically generated by
       *
       * conmakehash drivers/tty/vt/cp437.uni > [this file]
       *
       */
    
    With this commit, the header part of the generate C file will be
    simplified as follows:
    
      /*
       * Automatically generated file; Do not edit.
       */
    
    BTW, another series of excessive efforts for a comment header can be
    seen in the following:
    
     - 5ef6dc08cfde ("lib/build_OID_registry: don't mention the full path of the script in output")
     - 2fe29fe94563 ("lib/build_OID_registry: avoid non-destructive substitution for Perl < 5.13.2 compat")
    
    [1]: https://pubs.opengroup.org/onlinepubs/009695399/basedefs/limits.h.html
    
    Fixes: 6e20753da6bc ("tty: vt: conmakehash: cope with abs_srctree no longer in env")
    Cc: stable <stable@kernel.org>
    Reported-by: Daniel Gomez <da.gomez@samsung.com>
    Closes: https://lore.kernel.org/all/20240807-macos-build-support-v1-11-4cd1ded85694@samsung.com/
    Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
    Link: https://lore.kernel.org/r/20240809160853.1269466-1-masahiroy@kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

udp: fix receiving fraglist GSO packets [+ + +]

Author: Felix Fietkau <nbd@nbd.name>
Date:   Mon Aug 19 17:06:21 2024 +0200

    udp: fix receiving fraglist GSO packets
    
    [ Upstream commit b128ed5ab27330deeeaf51ea8bb69f1442a96f7f ]
    
    When assembling fraglist GSO packets, udp4_gro_complete does not set
    skb->csum_start, which makes the extra validation in __udp_gso_segment fail.
    
    Fixes: 89add40066f9 ("net: drop bad gso csum_start and offset in virtio_net_hdr")
    Signed-off-by: Felix Fietkau <nbd@nbd.name>
    Reviewed-by: Willem de Bruijn <willemb@google.com>
    Link: https://patch.msgid.link/20240819150621.59833-1-nbd@nbd.name
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

usb: misc: ljca: Add Lunar Lake ljca GPIO HID to ljca_gpio_hids[] [+ + +]

Author: Hans de Goede <hdegoede@redhat.com>
Date:   Mon Aug 12 11:50:38 2024 +0200

    usb: misc: ljca: Add Lunar Lake ljca GPIO HID to ljca_gpio_hids[]
    
    commit 3ed486e383ccee9b0c8d727608f12a937c6603ca upstream.
    
    Add LJCA GPIO support for the Lunar Lake platform.
    
    New HID taken from out of tree ivsc-driver git repo.
    
    Link: https://github.com/intel/ivsc-driver/commit/47e7c4a446c8ea8c741ff5a32fa7b19f9e6fd47e
    Cc: stable <stable@kernel.org>
    Signed-off-by: Hans de Goede <hdegoede@redhat.com>
    Link: https://lore.kernel.org/r/20240812095038.555837-1-hdegoede@redhat.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

usb: xhci: Check for xhci->interrupters being allocated in xhci_mem_clearup() [+ + +]

Author: Marc Zyngier <maz@kernel.org>
Date:   Fri Aug 9 15:44:07 2024 +0300

    usb: xhci: Check for xhci->interrupters being allocated in xhci_mem_clearup()
    
    commit dcdb52d948f3a17ccd3fce757d9bd981d7c32039 upstream.
    
    If xhci_mem_init() fails, it calls into xhci_mem_cleanup() to mop
    up the damage. If it fails early enough, before xhci->interrupters
    is allocated but after xhci->max_interrupters has been set, which
    happens in most (all?) cases, things get uglier, as xhci_mem_cleanup()
    unconditionally derefences xhci->interrupters. With prejudice.
    
    Gate the interrupt freeing loop with a check on xhci->interrupters
    being non-NULL.
    
    Found while debugging a DMA allocation issue that led the XHCI driver
    on this exact path.
    
    Fixes: c99b38c41234 ("xhci: add support to allocate several interrupters")
    Cc: Mathias Nyman <mathias.nyman@linux.intel.com>
    Cc: Wesley Cheng <quic_wcheng@quicinc.com>
    Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Signed-off-by: Marc Zyngier <maz@kernel.org>
    Cc: stable@vger.kernel.org # 6.8+
    Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
    Link: https://lore.kernel.org/r/20240809124408.505786-2-mathias.nyman@linux.intel.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

vfs: Don't evict inode under the inode lru traversing context [+ + +]

Author: Zhihao Cheng <chengzhihao1@huawei.com>
Date:   Fri Aug 9 11:16:28 2024 +0800

    vfs: Don't evict inode under the inode lru traversing context
    
    commit 2a0629834cd82f05d424bbc193374f9a43d1f87d upstream.
    
    The inode reclaiming process(See function prune_icache_sb) collects all
    reclaimable inodes and mark them with I_FREEING flag at first, at that
    time, other processes will be stuck if they try getting these inodes
    (See function find_inode_fast), then the reclaiming process destroy the
    inodes by function dispose_list(). Some filesystems(eg. ext4 with
    ea_inode feature, ubifs with xattr) may do inode lookup in the inode
    evicting callback function, if the inode lookup is operated under the
    inode lru traversing context, deadlock problems may happen.
    
    Case 1: In function ext4_evict_inode(), the ea inode lookup could happen
            if ea_inode feature is enabled, the lookup process will be stuck
            under the evicting context like this:
    
     1. File A has inode i_reg and an ea inode i_ea
     2. getfattr(A, xattr_buf) // i_ea is added into lru // lru->i_ea
     3. Then, following three processes running like this:
    
        PA                              PB
     echo 2 > /proc/sys/vm/drop_caches
      shrink_slab
       prune_dcache_sb
       // i_reg is added into lru, lru->i_ea->i_reg
       prune_icache_sb
        list_lru_walk_one
         inode_lru_isolate
          i_ea->i_state |= I_FREEING // set inode state
         inode_lru_isolate
          __iget(i_reg)
          spin_unlock(&i_reg->i_lock)
          spin_unlock(lru_lock)
                                         rm file A
                                          i_reg->nlink = 0
          iput(i_reg) // i_reg->nlink is 0, do evict
           ext4_evict_inode
            ext4_xattr_delete_inode
             ext4_xattr_inode_dec_ref_all
              ext4_xattr_inode_iget
               ext4_iget(i_ea->i_ino)
                iget_locked
                 find_inode_fast
                  __wait_on_freeing_inode(i_ea) ----→ AA deadlock
        dispose_list // cannot be executed by prune_icache_sb
         wake_up_bit(&i_ea->i_state)
    
    Case 2: In deleted inode writing function ubifs_jnl_write_inode(), file
            deleting process holds BASEHD's wbuf->io_mutex while getting the
            xattr inode, which could race with inode reclaiming process(The
            reclaiming process could try locking BASEHD's wbuf->io_mutex in
            inode evicting function), then an ABBA deadlock problem would
            happen as following:
    
     1. File A has inode ia and a xattr(with inode ixa), regular file B has
        inode ib and a xattr.
     2. getfattr(A, xattr_buf) // ixa is added into lru // lru->ixa
     3. Then, following three processes running like this:
    
            PA                PB                        PC
                    echo 2 > /proc/sys/vm/drop_caches
                     shrink_slab
                      prune_dcache_sb
                      // ib and ia are added into lru, lru->ixa->ib->ia
                      prune_icache_sb
                       list_lru_walk_one
                        inode_lru_isolate
                         ixa->i_state |= I_FREEING // set inode state
                        inode_lru_isolate
                         __iget(ib)
                         spin_unlock(&ib->i_lock)
                         spin_unlock(lru_lock)
                                                       rm file B
                                                        ib->nlink = 0
     rm file A
      iput(ia)
       ubifs_evict_inode(ia)
        ubifs_jnl_delete_inode(ia)
         ubifs_jnl_write_inode(ia)
          make_reservation(BASEHD) // Lock wbuf->io_mutex
          ubifs_iget(ixa->i_ino)
           iget_locked
            find_inode_fast
             __wait_on_freeing_inode(ixa)
              |          iput(ib) // ib->nlink is 0, do evict
              |           ubifs_evict_inode
              |            ubifs_jnl_delete_inode(ib)
              ↓             ubifs_jnl_write_inode
         ABBA deadlock ←-----make_reservation(BASEHD)
                       dispose_list // cannot be executed by prune_icache_sb
                        wake_up_bit(&ixa->i_state)
    
    Fix the possible deadlock by using new inode state flag I_LRU_ISOLATING
    to pin the inode in memory while inode_lru_isolate() reclaims its pages
    instead of using ordinary inode reference. This way inode deletion
    cannot be triggered from inode_lru_isolate() thus avoiding the deadlock.
    evict() is made to wait for I_LRU_ISOLATING to be cleared before
    proceeding with inode cleanup.
    
    Link: https://lore.kernel.org/all/37c29c42-7685-d1f0-067d-63582ffac405@huaweicloud.com/
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=219022
    Fixes: e50e5129f384 ("ext4: xattr-in-inode support")
    Fixes: 7959cf3a7506 ("ubifs: journal: Handle xattrs like files")
    Cc: stable@vger.kernel.org
    Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
    Link: https://lore.kernel.org/r/20240809031628.1069873-1-chengzhihao@huaweicloud.com
    Reviewed-by: Jan Kara <jack@suse.cz>
    Suggested-by: Jan Kara <jack@suse.cz>
    Suggested-by: Mateusz Guzik <mjguzik@gmail.com>
    Signed-off-by: Christian Brauner <brauner@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

vsock: fix recursive ->recvmsg calls [+ + +]

Author: Cong Wang <cong.wang@bytedance.com>
Date:   Sun Aug 11 19:21:53 2024 -0700

    vsock: fix recursive ->recvmsg calls
    
    [ Upstream commit 69139d2919dd4aa9a553c8245e7c63e82613e3fc ]
    
    After a vsock socket has been added to a BPF sockmap, its prot->recvmsg
    has been replaced with vsock_bpf_recvmsg(). Thus the following
    recursiion could happen:
    
    vsock_bpf_recvmsg()
     -> __vsock_recvmsg()
      -> vsock_connectible_recvmsg()
       -> prot->recvmsg()
        -> vsock_bpf_recvmsg() again
    
    We need to fix it by calling the original ->recvmsg() without any BPF
    sockmap logic in __vsock_recvmsg().
    
    Fixes: 634f1a7110b4 ("vsock: support sockmap")
    Reported-by: syzbot+bdb4bd87b5e22058e2a4@syzkaller.appspotmail.com
    Tested-by: syzbot+bdb4bd87b5e22058e2a4@syzkaller.appspotmail.com
    Cc: Bobby Eshleman <bobby.eshleman@bytedance.com>
    Cc: Michael S. Tsirkin <mst@redhat.com>
    Cc: Stefano Garzarella <sgarzare@redhat.com>
    Signed-off-by: Cong Wang <cong.wang@bytedance.com>
    Acked-by: Michael S. Tsirkin <mst@redhat.com>
    Link: https://patch.msgid.link/20240812022153.86512-1-xiyou.wangcong@gmail.com
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

wifi: ath12k: use 128 bytes aligned iova in transmit path for WCN7850 [+ + +]

Author: Baochen Qiang <quic_bqiang@quicinc.com>
Date:   Thu Aug 1 18:04:07 2024 +0300

    wifi: ath12k: use 128 bytes aligned iova in transmit path for WCN7850
    
    [ Upstream commit 38055789d15155109b41602ad719d770af507030 ]
    
    In transmit path, it is likely that the iova is not aligned to PCIe TLP
    max payload size, which is 128 for WCN7850. Normally in such cases hardware
    is expected to split the packet into several parts in a manner such that
    they, other than the first one, have aligned iova. However due to hardware
    limitations, WCN7850 does not behave like that properly with some specific
    unaligned iova in transmit path. This easily results in target hang in a
    KPI transmit test: packet send/receive failure, WMI command send timeout
    etc. Also fatal error seen in PCIe level:
    
            ...
            Capabilities: ...
                    ...
                    DevSta: ... FatalErr+ ...
                    ...
            ...
    
    Work around this by manually moving/reallocating payload buffer such that
    we can map it to a 128 bytes aligned iova. The moving requires sufficient
    head room or tail room in skb: for the former we can do ourselves a favor
    by asking some extra bytes when registering with mac80211, while for the
    latter we can do nothing.
    
    Moving/reallocating buffer consumes additional CPU cycles, but the good news
    is that an aligned iova increases PCIe efficiency. In my tests on some X86
    platforms the KPI results are almost consistent.
    
    Since this is seen only with WCN7850, add a new hardware parameter to
    differentiate from others.
    
    Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0.c5-00481-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3
    
    Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com>
    Cc: <stable@vger.kernel.org>
    Tested-by: Mark Pearson <mpearson-lenovo@squebb.ca>
    Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
    Link: https://patch.msgid.link/20240715023814.20242-1-quic_bqiang@quicinc.com
    Signed-off-by: Sasha Levin <sashal@kernel.org>

wifi: brcmfmac: cfg80211: Handle SSID based pmksa deletion [+ + +]

Author: Janne Grunau <j@jannau.net>
Date:   Sat Aug 3 21:52:55 2024 +0200

    wifi: brcmfmac: cfg80211: Handle SSID based pmksa deletion
    
    commit 2ad4e1ada8eebafa2d75a4b75eeeca882de6ada1 upstream.
    
    wpa_supplicant 2.11 sends since 1efdba5fdc2c ("Handle PMKSA flush in the
    driver for SAE/OWE offload cases") SSID based PMKSA del commands.
    brcmfmac is not prepared and tries to dereference the NULL bssid and
    pmkid pointers in cfg80211_pmksa. PMKID_V3 operations support SSID based
    updates so copy the SSID.
    
    Fixes: a96202acaea4 ("wifi: brcmfmac: cfg80211: Add support for PMKID_V3 operations")
    Cc: stable@vger.kernel.org # 6.4.x
    Signed-off-by: Janne Grunau <j@jannau.net>
    Reviewed-by: Neal Gompa <neal@gompa.dev>
    Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com>
    Signed-off-by: Kalle Valo <kvalo@kernel.org>
    Link: https://patch.msgid.link/20240803-brcmfmac_pmksa_del_ssid-v1-1-4e85f19135e1@jannau.net
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

workqueue: Fix spruious data race in __flush_work() [+ + +]

Author: Tejun Heo <tj@kernel.org>
Date:   Mon Aug 5 09:37:25 2024 -1000

    workqueue: Fix spruious data race in __flush_work()
    
    [ Upstream commit 8bc35475ef1a23b0e224f3242eb11c76cab0ea88 ]
    
    When flushing a work item for cancellation, __flush_work() knows that it
    exclusively owns the work item through its PENDING bit. 134874e2eee9
    ("workqueue: Allow cancel_work_sync() and disable_work() from atomic
    contexts on BH work items") added a read of @work->data to determine whether
    to use busy wait for BH work items that are being canceled. While the read
    is safe when @from_cancel, @work->data was read before testing @from_cancel
    to simplify code structure:
    
            data = *work_data_bits(work);
            if (from_cancel &&
                !WARN_ON_ONCE(data & WORK_STRUCT_PWQ) && (data & WORK_OFFQ_BH)) {
    
    While the read data was never used if !@from_cancel, this could trigger
    KCSAN data race detection spuriously:
    
      ==================================================================
      BUG: KCSAN: data-race in __flush_work / __flush_work
    
      write to 0xffff8881223aa3e8 of 8 bytes by task 3998 on cpu 0:
       instrument_write include/linux/instrumented.h:41 [inline]
       ___set_bit include/asm-generic/bitops/instrumented-non-atomic.h:28 [inline]
       insert_wq_barrier kernel/workqueue.c:3790 [inline]
       start_flush_work kernel/workqueue.c:4142 [inline]
       __flush_work+0x30b/0x570 kernel/workqueue.c:4178
       flush_work kernel/workqueue.c:4229 [inline]
       ...
    
      read to 0xffff8881223aa3e8 of 8 bytes by task 50 on cpu 1:
       __flush_work+0x42a/0x570 kernel/workqueue.c:4188
       flush_work kernel/workqueue.c:4229 [inline]
       flush_delayed_work+0x66/0x70 kernel/workqueue.c:4251
       ...
    
      value changed: 0x0000000000400000 -> 0xffff88810006c00d
    
    Reorganize the code so that @from_cancel is tested before @work->data is
    accessed. The only problem is triggering KCSAN detection spuriously. This
    shouldn't need READ_ONCE() or other access qualifiers.
    
    No functional changes.
    
    Signed-off-by: Tejun Heo <tj@kernel.org>
    Reported-by: syzbot+b3e4f2f51ed645fd5df2@syzkaller.appspotmail.com
    Fixes: 134874e2eee9 ("workqueue: Allow cancel_work_sync() and disable_work() from atomic contexts on BH work items")
    Link: http://lkml.kernel.org/r/000000000000ae429e061eea2157@google.com
    Cc: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

workqueue: Fix UBSAN 'subtraction overflow' error in shift_and_mask() [+ + +]

Author: Will Deacon <will@kernel.org>
Date:   Tue Jul 30 12:44:31 2024 +0100

    workqueue: Fix UBSAN 'subtraction overflow' error in shift_and_mask()
    
    [ Upstream commit 38f7e14519d39cf524ddc02d4caee9b337dad703 ]
    
    UBSAN reports the following 'subtraction overflow' error when booting
    in a virtual machine on Android:
    
     | Internal error: UBSAN: integer subtraction overflow: 00000000f2005515 [#1] PREEMPT SMP
     | Modules linked in:
     | CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.10.0-00006-g3cbe9e5abd46-dirty #4
     | Hardware name: linux,dummy-virt (DT)
     | pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
     | pc : cancel_delayed_work+0x34/0x44
     | lr : cancel_delayed_work+0x2c/0x44
     | sp : ffff80008002ba60
     | x29: ffff80008002ba60 x28: 0000000000000000 x27: 0000000000000000
     | x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
     | x23: 0000000000000000 x22: 0000000000000000 x21: ffff1f65014cd3c0
     | x20: ffffc0e84c9d0da0 x19: ffffc0e84cab3558 x18: ffff800080009058
     | x17: 00000000247ee1f8 x16: 00000000247ee1f8 x15: 00000000bdcb279d
     | x14: 0000000000000001 x13: 0000000000000075 x12: 00000a0000000000
     | x11: ffff1f6501499018 x10: 00984901651fffff x9 : ffff5e7cc35af000
     | x8 : 0000000000000001 x7 : 3d4d455453595342 x6 : 000000004e514553
     | x5 : ffff1f6501499265 x4 : ffff1f650ff60b10 x3 : 0000000000000620
     | x2 : ffff80008002ba78 x1 : 0000000000000000 x0 : 0000000000000000
     | Call trace:
     |  cancel_delayed_work+0x34/0x44
     |  deferred_probe_extend_timeout+0x20/0x70
     |  driver_register+0xa8/0x110
     |  __platform_driver_register+0x28/0x3c
     |  syscon_init+0x24/0x38
     |  do_one_initcall+0xe4/0x338
     |  do_initcall_level+0xac/0x178
     |  do_initcalls+0x5c/0xa0
     |  do_basic_setup+0x20/0x30
     |  kernel_init_freeable+0x8c/0xf8
     |  kernel_init+0x28/0x1b4
     |  ret_from_fork+0x10/0x20
     | Code: f9000fbf 97fffa2f 39400268 37100048 (d42aa2a0)
     | ---[ end trace 0000000000000000 ]---
     | Kernel panic - not syncing: UBSAN: integer subtraction overflow: Fatal exception
    
    This is due to shift_and_mask() using a signed immediate to construct
    the mask and being called with a shift of 31 (WORK_OFFQ_POOL_SHIFT) so
    that it ends up decrementing from INT_MIN.
    
    Use an unsigned constant '1U' to generate the mask in shift_and_mask().
    
    Cc: Tejun Heo <tj@kernel.org>
    Cc: Lai Jiangshan <jiangshanlai@gmail.com>
    Fixes: 1211f3b21c2a ("workqueue: Preserve OFFQ bits in cancel[_sync] paths")
    Signed-off-by: Will Deacon <will@kernel.org>
    Signed-off-by: Tejun Heo <tj@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

xhci: Fix Panther point NULL pointer deref at full-speed re-enumeration [+ + +]

Author: Mathias Nyman <mathias.nyman@linux.intel.com>
Date:   Thu Aug 15 17:11:17 2024 +0300

    xhci: Fix Panther point NULL pointer deref at full-speed re-enumeration
    
    commit af8e119f52e9c13e556be9e03f27957554a84656 upstream.
    
    re-enumerating full-speed devices after a failed address device command
    can trigger a NULL pointer dereference.
    
    Full-speed devices may need to reconfigure the endpoint 0 Max Packet Size
    value during enumeration. Usb core calls usb_ep0_reinit() in this case,
    which ends up calling xhci_configure_endpoint().
    
    On Panther point xHC the xhci_configure_endpoint() function will
    additionally check and reserve bandwidth in software. Other hosts do
    this in hardware
    
    If xHC address device command fails then a new xhci_virt_device structure
    is allocated as part of re-enabling the slot, but the bandwidth table
    pointers are not set up properly here.
    This triggers the NULL pointer dereference the next time usb_ep0_reinit()
    is called and xhci_configure_endpoint() tries to check and reserve
    bandwidth
    
    [46710.713538] usb 3-1: new full-speed USB device number 5 using xhci_hcd
    [46710.713699] usb 3-1: Device not responding to setup address.
    [46710.917684] usb 3-1: Device not responding to setup address.
    [46711.125536] usb 3-1: device not accepting address 5, error -71
    [46711.125594] BUG: kernel NULL pointer dereference, address: 0000000000000008
    [46711.125600] #PF: supervisor read access in kernel mode
    [46711.125603] #PF: error_code(0x0000) - not-present page
    [46711.125606] PGD 0 P4D 0
    [46711.125610] Oops: Oops: 0000 [#1] PREEMPT SMP PTI
    [46711.125615] CPU: 1 PID: 25760 Comm: kworker/1:2 Not tainted 6.10.3_2 #1
    [46711.125620] Hardware name: Gigabyte Technology Co., Ltd.
    [46711.125623] Workqueue: usb_hub_wq hub_event [usbcore]
    [46711.125668] RIP: 0010:xhci_reserve_bandwidth (drivers/usb/host/xhci.c
    
    Fix this by making sure bandwidth table pointers are set up correctly
    after a failed address device command, and additionally by avoiding
    checking for bandwidth in cases like this where no actual endpoints are
    added or removed, i.e. only context for default control endpoint 0 is
    evaluated.
    
    Reported-by: Karel Balej <balejk@matfyz.cz>
    Closes: https://lore.kernel.org/linux-usb/D3CKQQAETH47.1MUO22RTCH2O3@matfyz.cz/
    Cc: stable@vger.kernel.org
    Fixes: 651aaf36a7d7 ("usb: xhci: Handle USB transaction error on address command")
    Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
    Link: https://lore.kernel.org/r/20240815141117.2702314-2-mathias.nyman@linux.intel.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>