Список изменений в ядре 6.1.85

9p: Fix read/write debug statements to report server reply [+ + +]

Author: Dominique Martinet <asmadeus@codewreck.org>
Date:   Tue Jan 9 12:39:03 2024 +0900

    9p: Fix read/write debug statements to report server reply
    
    [ Upstream commit be3193e58ec210b2a72fb1134c2a0695088a911d ]
    
    Previous conversion to iov missed these debug statements which would now
    always print the requested size instead of the actual server reply.
    
    Write also added a loop in a much older commit but we didn't report
    these, while reads do report each iteration -- it's more coherent to
    keep reporting all requests to server so move that at the same time.
    
    Fixes: 7f02464739da ("9p: convert to advancing variant of iov_iter_get_pages_alloc()")
    Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
    Message-ID: <20240109-9p-rw-trace-v1-1-327178114257@codewreck.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ACPICA: debugger: check status of acpi_evaluate_object() in acpi_db_walk_for_fields() [+ + +]

Author: Nikita Kiryushin <kiryushin@ancud.ru>
Date:   Fri Mar 22 21:07:53 2024 +0300

    ACPICA: debugger: check status of acpi_evaluate_object() in acpi_db_walk_for_fields()
    
    [ Upstream commit 40e2710860e57411ab57a1529c5a2748abbe8a19 ]
    
    ACPICA commit 9061cd9aa131205657c811a52a9f8325a040c6c9
    
    Errors in acpi_evaluate_object() can lead to incorrect state of buffer.
    
    This can lead to access to data in previously ACPI_FREEd buffer and
    secondary ACPI_FREE to the same buffer later.
    
    Handle errors in acpi_evaluate_object the same way it is done earlier
    with acpi_ns_handle_to_pathname.
    
    Found by Linux Verification Center (linuxtesting.org) with SVACE.
    
    Link: https://github.com/acpica/acpica/commit/9061cd9a
    Fixes: 5fd033288a86 ("ACPICA: debugger: add command to dump all fields of particular subtype")
    Signed-off-by: Nikita Kiryushin <kiryushin@ancud.ru>
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ALSA: hda/realtek - Fix inactive headset mic jack [+ + +]

Author: Christoffer Sandberg <cs@tuxedo.de>
Date:   Thu Mar 28 11:27:57 2024 +0100

    ALSA: hda/realtek - Fix inactive headset mic jack
    
    commit daf6c4681a74034d5723e2fb761e0d7f3a1ca18f upstream.
    
    This patch adds the existing fixup to certain TF platforms implementing
    the ALC274 codec with a headset jack. It fixes/activates the inactive
    microphone of the headset.
    
    Signed-off-by: Christoffer Sandberg <cs@tuxedo.de>
    Signed-off-by: Werner Sembach <wse@tuxedocomputers.com>
    Cc: <stable@vger.kernel.org>
    Message-ID: <20240328102757.50310-1-wse@tuxedocomputers.com>
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: hda/realtek: Update Panasonic CF-SZ6 quirk to support headset with microphone [+ + +]

Author: I Gede Agastya Darma Laksana <gedeagas22@gmail.com>
Date:   Tue Apr 2 00:46:02 2024 +0700

    ALSA: hda/realtek: Update Panasonic CF-SZ6 quirk to support headset with microphone
    
    commit 1576f263ee2147dc395531476881058609ad3d38 upstream.
    
    This patch addresses an issue with the Panasonic CF-SZ6's existing quirk,
    specifically its headset microphone functionality. Previously, the quirk
    used ALC269_FIXUP_HEADSET_MODE, which does not support the CF-SZ6's design
    of a single 3.5mm jack for both mic and audio output effectively. The
    device uses pin 0x19 for the headset mic without jack detection.
    
    Following verification on the CF-SZ6 and discussions with the original
    patch author, i determined that the update to
    ALC269_FIXUP_ASPIRE_HEADSET_MIC is the appropriate solution. This change
    is custom-designed for the CF-SZ6's unique hardware setup, which includes
    a single 3.5mm jack for both mic and audio output, connecting the headset
    microphone to pin 0x19 without the use of jack detection.
    
    Fixes: 0fca97a29b83 ("ALSA: hda/realtek - Add Panasonic CF-SZ6 headset jack quirk")
    Signed-off-by: I Gede Agastya Darma Laksana <gedeagas22@gmail.com>
    Cc: <stable@vger.kernel.org>
    Message-ID: <20240401174602.14133-1-gedeagas22@gmail.com>
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

arm64: dts: qcom: sc7180-trogdor: mark bluetooth address as broken [+ + +]

Author: Johan Hovold <johan+linaro@kernel.org>
Date:   Wed Mar 20 08:55:52 2024 +0100

    arm64: dts: qcom: sc7180-trogdor: mark bluetooth address as broken
    
    commit e12e28009e584c8f8363439f6a928ec86278a106 upstream.
    
    Several Qualcomm Bluetooth controllers lack persistent storage for the
    device address and instead one can be provided by the boot firmware
    using the 'local-bd-address' devicetree property.
    
    The Bluetooth bindings clearly states that the address should be
    specified in little-endian order, but due to a long-standing bug in the
    Qualcomm driver which reversed the address some boot firmware has been
    providing the address in big-endian order instead.
    
    The boot firmware in SC7180 Trogdor Chromebooks is known to be affected
    so mark the 'local-bd-address' property as broken to maintain backwards
    compatibility with older firmware when fixing the underlying driver bug.
    
    Note that ChromeOS always updates the kernel and devicetree in lockstep
    so that there is no need to handle backwards compatibility with older
    devicetrees.
    
    Fixes: 7ec3e67307f8 ("arm64: dts: qcom: sc7180-trogdor: add initial trogdor and lazor dt")
    Cc: stable@vger.kernel.org      # 5.10
    Cc: Rob Clark <robdclark@chromium.org>
    Reviewed-by: Douglas Anderson <dianders@chromium.org>
    Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
    Acked-by: Bjorn Andersson <andersson@kernel.org>
    Reviewed-by: Bjorn Andersson <andersson@kernel.org>
    Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ASoC: ops: Fix wraparound for mask in snd_soc_get_volsw [+ + +]

Author: Stephen Lee <slee08177@gmail.com>
Date:   Mon Mar 25 18:01:31 2024 -0700

    ASoC: ops: Fix wraparound for mask in snd_soc_get_volsw
    
    [ Upstream commit fc563aa900659a850e2ada4af26b9d7a3de6c591 ]
    
    In snd_soc_info_volsw(), mask is generated by figuring out the index of
    the most significant bit set in max and converting the index to a
    bitmask through bit shift 1. Unintended wraparound occurs when max is an
    integer value with msb bit set. Since the bit shift value 1 is treated
    as an integer type, the left shift operation will wraparound and set
    mask to 0 instead of all 1's. In order to fix this, we type cast 1 as
    `1ULL` to prevent the wraparound.
    
    Fixes: 7077148fb50a ("ASoC: core: Split ops out of soc-core.c")
    Signed-off-by: Stephen Lee <slee08177@gmail.com>
    Link: https://msgid.link/r/20240326010131.6211-1-slee08177@gmail.com
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ASoC: rt5682-sdw: fix locking sequence [+ + +]

Author: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Date:   Mon Mar 25 17:18:12 2024 -0500

    ASoC: rt5682-sdw: fix locking sequence
    
    [ Upstream commit 310a5caa4e861616a27a83c3e8bda17d65026fa8 ]
    
    The disable_irq_lock protects the 'disable_irq' value, we need to lock
    before testing it.
    
    Fixes: 02fb23d72720 ("ASoC: rt5682-sdw: fix for JD event handling in ClockStop Mode0")
    Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
    Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
    Reviewed-by: Chao Song <chao.song@linux.intel.com>
    Link: https://msgid.link/r/20240325221817.206465-2-pierre-louis.bossart@linux.intel.com
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ASoC: rt711-sdca: fix locking sequence [+ + +]

Author: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Date:   Mon Mar 25 17:18:13 2024 -0500

    ASoC: rt711-sdca: fix locking sequence
    
    [ Upstream commit ee287771644394d071e6a331951ee8079b64f9a7 ]
    
    The disable_irq_lock protects the 'disable_irq' value, we need to lock
    before testing it.
    
    Fixes: 23adeb7056ac ("ASoC: rt711-sdca: fix for JD event handling in ClockStop Mode0")
    Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
    Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
    Reviewed-by: Chao Song <chao.song@linux.intel.com>
    Link: https://msgid.link/r/20240325221817.206465-3-pierre-louis.bossart@linux.intel.com
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ASoC: rt711-sdw: fix locking sequence [+ + +]

Author: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Date:   Mon Mar 25 17:18:14 2024 -0500

    ASoC: rt711-sdw: fix locking sequence
    
    [ Upstream commit aae86cfd8790bcc7693a5a0894df58de5cb5128c ]
    
    The disable_irq_lock protects the 'disable_irq' value, we need to lock
    before testing it.
    
    Fixes: b69de265bd0e ("ASoC: rt711: fix for JD event handling in ClockStop Mode0")
    Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
    Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
    Reviewed-by: Chao Song <chao.song@linux.intel.com>
    Link: https://msgid.link/r/20240325221817.206465-4-pierre-louis.bossart@linux.intel.com
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ata: sata_mv: Fix PCI device ID table declaration compilation warning [+ + +]

Author: Arnd Bergmann <arnd@arndb.de>
Date:   Wed Apr 3 10:06:48 2024 +0200

    ata: sata_mv: Fix PCI device ID table declaration compilation warning
    
    [ Upstream commit 3137b83a90646917c90951d66489db466b4ae106 ]
    
    Building with W=1 shows a warning for an unused variable when CONFIG_PCI
    is diabled:
    
    drivers/ata/sata_mv.c:790:35: error: unused variable 'mv_pci_tbl' [-Werror,-Wunused-const-variable]
    static const struct pci_device_id mv_pci_tbl[] = {
    
    Move the table into the same block that containsn the pci_driver
    definition.
    
    Fixes: 7bb3c5290ca0 ("sata_mv: Remove PCI dependency")
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>
    Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ata: sata_sx4: fix pdc20621_get_from_dimm() on 64-bit [+ + +]

Author: Arnd Bergmann <arnd@arndb.de>
Date:   Tue Mar 26 15:53:37 2024 +0100

    ata: sata_sx4: fix pdc20621_get_from_dimm() on 64-bit
    
    [ Upstream commit 52f80bb181a9a1530ade30bc18991900bbb9697f ]
    
    gcc warns about a memcpy() with overlapping pointers because of an
    incorrect size calculation:
    
    In file included from include/linux/string.h:369,
                     from drivers/ata/sata_sx4.c:66:
    In function 'memcpy_fromio',
        inlined from 'pdc20621_get_from_dimm.constprop' at drivers/ata/sata_sx4.c:962:2:
    include/linux/fortify-string.h:97:33: error: '__builtin_memcpy' accessing 4294934464 bytes at offsets 0 and [16, 16400] overlaps 6442385281 bytes at offset -2147450817 [-Werror=restrict]
       97 | #define __underlying_memcpy     __builtin_memcpy
          |                                 ^
    include/linux/fortify-string.h:620:9: note: in expansion of macro '__underlying_memcpy'
      620 |         __underlying_##op(p, q, __fortify_size);                        \
          |         ^~~~~~~~~~~~~
    include/linux/fortify-string.h:665:26: note: in expansion of macro '__fortify_memcpy_chk'
      665 | #define memcpy(p, q, s)  __fortify_memcpy_chk(p, q, s,                  \
          |                          ^~~~~~~~~~~~~~~~~~~~
    include/asm-generic/io.h:1184:9: note: in expansion of macro 'memcpy'
     1184 |         memcpy(buffer, __io_virt(addr), size);
          |         ^~~~~~
    
    The problem here is the overflow of an unsigned 32-bit number to a
    negative that gets converted into a signed 'long', keeping a large
    positive number.
    
    Replace the complex calculation with a more readable min() variant
    that avoids the warning.
    
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>
    Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

Bluetooth: add quirk for broken address properties [+ + +]

Author: Johan Hovold <johan+linaro@kernel.org>
Date:   Wed Mar 20 08:55:53 2024 +0100

    Bluetooth: add quirk for broken address properties
    
    commit 39646f29b100566451d37abc4cc8cdd583756dfe upstream.
    
    Some Bluetooth controllers lack persistent storage for the device
    address and instead one can be provided by the boot firmware using the
    'local-bd-address' devicetree property.
    
    The Bluetooth devicetree bindings clearly states that the address should
    be specified in little-endian order, but due to a long-standing bug in
    the Qualcomm driver which reversed the address some boot firmware has
    been providing the address in big-endian order instead.
    
    Add a new quirk that can be set on platforms with broken firmware and
    use it to reverse the address when parsing the property so that the
    underlying driver bug can be fixed.
    
    Fixes: 5c0a1001c8be ("Bluetooth: hci_qca: Add helper to set device address")
    Cc: stable@vger.kernel.org      # 5.1
    Reviewed-by: Douglas Anderson <dianders@chromium.org>
    Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
    Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Bluetooth: Fix TOCTOU in HCI debugfs implementation [+ + +]

Author: Bastien Nocera <hadess@hadess.net>
Date:   Wed Mar 27 15:24:56 2024 +0100

    Bluetooth: Fix TOCTOU in HCI debugfs implementation
    
    commit 7835fcfd132eb88b87e8eb901f88436f63ab60f7 upstream.
    
    struct hci_dev members conn_info_max_age, conn_info_min_age,
    le_conn_max_interval, le_conn_min_interval, le_adv_max_interval,
    and le_adv_min_interval can be modified from the HCI core code, as well
    through debugfs.
    
    The debugfs implementation, that's only available to privileged users,
    will check for boundaries, making sure that the minimum value being set
    is strictly above the maximum value that already exists, and vice-versa.
    
    However, as both minimum and maximum values can be changed concurrently
    to us modifying them, we need to make sure that the value we check is
    the value we end up using.
    
    For example, with ->conn_info_max_age set to 10, conn_info_min_age_set()
    gets called from vfs handlers to set conn_info_min_age to 8.
    
    In conn_info_min_age_set(), this goes through:
            if (val == 0 || val > hdev->conn_info_max_age)
                    return -EINVAL;
    
    Concurrently, conn_info_max_age_set() gets called to set to set the
    conn_info_max_age to 7:
            if (val == 0 || val > hdev->conn_info_max_age)
                    return -EINVAL;
    That check will also pass because we used the old value (10) for
    conn_info_max_age.
    
    After those checks that both passed, the struct hci_dev access
    is mutex-locked, disabling concurrent access, but that does not matter
    because the invalid value checks both passed, and we'll end up with
    conn_info_min_age = 8 and conn_info_max_age = 7
    
    To fix this problem, we need to lock the structure access before so the
    check and assignment are not interrupted.
    
    This fix was originally devised by the BassCheck[1] team, and
    considered the problem to be an atomicity one. This isn't the case as
    there aren't any concerns about the variable changing while we check it,
    but rather after we check it parallel to another change.
    
    This patch fixes CVE-2024-24858 and CVE-2024-24857.
    
    [1] https://sites.google.com/view/basscheck/
    
    Co-developed-by: Gui-Dong Han <2045gemini@gmail.com>
    Signed-off-by: Gui-Dong Han <2045gemini@gmail.com>
    Link: https://lore.kernel.org/linux-bluetooth/20231222161317.6255-1-2045gemini@gmail.com/
    Link: https://nvd.nist.gov/vuln/detail/CVE-2024-24858
    Link: https://lore.kernel.org/linux-bluetooth/20231222162931.6553-1-2045gemini@gmail.com/
    Link: https://lore.kernel.org/linux-bluetooth/20231222162310.6461-1-2045gemini@gmail.com/
    Link: https://nvd.nist.gov/vuln/detail/CVE-2024-24857
    Fixes: 31ad169148df ("Bluetooth: Add conn info lifetime parameters to debugfs")
    Fixes: 729a1051da6f ("Bluetooth: Expose default LE advertising interval via debugfs")
    Fixes: 71c3b60ec6d2 ("Bluetooth: Move BR/EDR debugfs file creation into hci_debugfs.c")
    Signed-off-by: Bastien Nocera <hadess@hadess.net>
    Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Bluetooth: hci_event: set the conn encrypted before conn establishes [+ + +]

Author: Hui Wang <hui.wang@canonical.com>
Date:   Wed Mar 27 12:30:30 2024 +0800

    Bluetooth: hci_event: set the conn encrypted before conn establishes
    
    commit c569242cd49287d53b73a94233db40097d838535 upstream.
    
    We have a BT headset (Lenovo Thinkplus XT99), the pairing and
    connecting has no problem, once this headset is paired, bluez will
    remember this device and will auto re-connect it whenever the device
    is powered on. The auto re-connecting works well with Windows and
    Android, but with Linux, it always fails. Through debugging, we found
    at the rfcomm connection stage, the bluetooth stack reports
    "Connection refused - security block (0x0003)".
    
    For this device, the re-connecting negotiation process is different
    from other BT headsets, it sends the Link_KEY_REQUEST command before
    the CONNECT_REQUEST completes, and it doesn't send ENCRYPT_CHANGE
    command during the negotiation. When the device sends the "connect
    complete" to hci, the ev->encr_mode is 1.
    
    So here in the conn_complete_evt(), if ev->encr_mode is 1, link type
    is ACL and HCI_CONN_ENCRYPT is not set, we set HCI_CONN_ENCRYPT to
    this conn, and update conn->enc_key_size accordingly.
    
    After this change, this BT headset could re-connect with Linux
    successfully. This is the btmon log after applying the patch, after
    receiving the "Connect Complete" with "Encryption: Enabled", will send
    the command to read encryption key size:
    > HCI Event: Connect Request (0x04) plen 10
            Address: 8C:3C:AA:D8:11:67 (OUI 8C-3C-AA)
            Class: 0x240404
              Major class: Audio/Video (headset, speaker, stereo, video, vcr)
              Minor class: Wearable Headset Device
              Rendering (Printing, Speaker)
              Audio (Speaker, Microphone, Headset)
            Link type: ACL (0x01)
    ...
    > HCI Event: Link Key Request (0x17) plen 6
            Address: 8C:3C:AA:D8:11:67 (OUI 8C-3C-AA)
    < HCI Command: Link Key Request Reply (0x01|0x000b) plen 22
            Address: 8C:3C:AA:D8:11:67 (OUI 8C-3C-AA)
            Link key: ${32-hex-digits-key}
    ...
    > HCI Event: Connect Complete (0x03) plen 11
            Status: Success (0x00)
            Handle: 256
            Address: 8C:3C:AA:D8:11:67 (OUI 8C-3C-AA)
            Link type: ACL (0x01)
            Encryption: Enabled (0x01)
    < HCI Command: Read Encryption Key... (0x05|0x0008) plen 2
            Handle: 256
    < ACL Data TX: Handle 256 flags 0x00 dlen 10
          L2CAP: Information Request (0x0a) ident 1 len 2
            Type: Extended features supported (0x0002)
    > HCI Event: Command Complete (0x0e) plen 7
          Read Encryption Key Size (0x05|0x0008) ncmd 1
            Status: Success (0x00)
            Handle: 256
            Key size: 16
    
    Cc: stable@vger.kernel.org
    Link: https://github.com/bluez/bluez/issues/704
    Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
    Reviewed-by: Luiz Augusto von Dentz <luiz.dentz@gmail.com>
    Signed-off-by: Hui Wang <hui.wang@canonical.com>
    Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Bluetooth: qca: fix device-address endianness [+ + +]

Author: Johan Hovold <johan+linaro@kernel.org>
Date:   Wed Mar 20 08:55:54 2024 +0100

    Bluetooth: qca: fix device-address endianness
    
    commit 77f45cca8bc55d00520a192f5a7715133591c83e upstream.
    
    The WCN6855 firmware on the Lenovo ThinkPad X13s expects the Bluetooth
    device address in big-endian order when setting it using the
    EDL_WRITE_BD_ADDR_OPCODE command.
    
    Presumably, this is the case for all non-ROME devices which all use the
    EDL_WRITE_BD_ADDR_OPCODE command for this (unlike the ROME devices which
    use a different command and expect the address in little-endian order).
    
    Reverse the little-endian address before setting it to make sure that
    the address can be configured using tools like btmgmt or using the
    'local-bd-address' devicetree property.
    
    Note that this can potentially break systems with boot firmware which
    has started relying on the broken behaviour and is incorrectly passing
    the address via devicetree in big-endian order.
    
    The only device affected by this should be the WCN3991 used in some
    Chromebooks. As ChromeOS updates the kernel and devicetree in lockstep,
    the new 'qcom,local-bd-address-broken' property can be used to determine
    if the firmware is buggy so that the underlying driver bug can be fixed
    without breaking backwards compatibility.
    
    Set the HCI_QUIRK_BDADDR_PROPERTY_BROKEN quirk for such platforms so
    that the address is reversed when parsing the address property.
    
    Fixes: 5c0a1001c8be ("Bluetooth: hci_qca: Add helper to set device address")
    Cc: stable@vger.kernel.org      # 5.1
    Cc: Balakrishna Godavarthi <quic_bgodavar@quicinc.com>
    Cc: Matthias Kaehlcke <mka@chromium.org>
    Tested-by: Nikita Travkin <nikita@trvn.ru> # sc7180
    Reviewed-by: Douglas Anderson <dianders@chromium.org>
    Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
    Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

bpf, sockmap: Prevent lock inversion deadlock in map delete elem [+ + +]

Author: Jakub Sitnicki <jakub@cloudflare.com>
Date:   Tue Apr 2 12:46:21 2024 +0200

    bpf, sockmap: Prevent lock inversion deadlock in map delete elem
    
    commit ff91059932401894e6c86341915615c5eb0eca48 upstream.
    
    syzkaller started using corpuses where a BPF tracing program deletes
    elements from a sockmap/sockhash map. Because BPF tracing programs can be
    invoked from any interrupt context, locks taken during a map_delete_elem
    operation must be hardirq-safe. Otherwise a deadlock due to lock inversion
    is possible, as reported by lockdep:
    
           CPU0                    CPU1
           ----                    ----
      lock(&htab->buckets[i].lock);
                                   local_irq_disable();
                                   lock(&host->lock);
                                   lock(&htab->buckets[i].lock);
      <Interrupt>
        lock(&host->lock);
    
    Locks in sockmap are hardirq-unsafe by design. We expects elements to be
    deleted from sockmap/sockhash only in task (normal) context with interrupts
    enabled, or in softirq context.
    
    Detect when map_delete_elem operation is invoked from a context which is
    _not_ hardirq-unsafe, that is interrupts are disabled, and bail out with an
    error.
    
    Note that map updates are not affected by this issue. BPF verifier does not
    allow updating sockmap/sockhash from a BPF tracing program today.
    
    Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
    Reported-by: xingwei lee <xrivendell7@gmail.com>
    Reported-by: yue sun <samsun1006219@gmail.com>
    Reported-by: syzbot+bc922f476bd65abbd466@syzkaller.appspotmail.com
    Reported-by: syzbot+d4066896495db380182e@syzkaller.appspotmail.com
    Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Tested-by: syzbot+d4066896495db380182e@syzkaller.appspotmail.com
    Acked-by: John Fastabend <john.fastabend@gmail.com>
    Closes: https://syzkaller.appspot.com/bug?extid=d4066896495db380182e
    Closes: https://syzkaller.appspot.com/bug?extid=bc922f476bd65abbd466
    Link: https://lore.kernel.org/bpf/20240402104621.1050319-1-jakub@cloudflare.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

bpf: Protect against int overflow for stack access size [+ + +]

Author: Andrei Matei <andreimatei1@gmail.com>
Date:   Tue Mar 26 22:42:45 2024 -0400

    bpf: Protect against int overflow for stack access size
    
    [ Upstream commit ecc6a2101840177e57c925c102d2d29f260d37c8 ]
    
    This patch re-introduces protection against the size of access to stack
    memory being negative; the access size can appear negative as a result
    of overflowing its signed int representation. This should not actually
    happen, as there are other protections along the way, but we should
    protect against it anyway. One code path was missing such protections
    (fixed in the previous patch in the series), causing out-of-bounds array
    accesses in check_stack_range_initialized(). This patch causes the
    verification of a program with such a non-sensical access size to fail.
    
    This check used to exist in a more indirect way, but was inadvertendly
    removed in a833a17aeac7.
    
    Fixes: a833a17aeac7 ("bpf: Fix verification of indirect var-off stack access")
    Reported-by: syzbot+33f4297b5f927648741a@syzkaller.appspotmail.com
    Reported-by: syzbot+aafd0513053a1cbf52ef@syzkaller.appspotmail.com
    Closes: https://lore.kernel.org/bpf/CAADnVQLORV5PT0iTAhRER+iLBTkByCYNBYyvBSgjN1T31K+gOw@mail.gmail.com/
    Acked-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Andrei Matei <andreimatei1@gmail.com>
    Link: https://lore.kernel.org/r/20240327024245.318299-3-andreimatei1@gmail.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

cifs: Fix caching to try to do open O_WRONLY as rdwr on server [+ + +]

Author: David Howells <dhowells@redhat.com>
Date:   Tue Apr 2 10:11:35 2024 +0100

    cifs: Fix caching to try to do open O_WRONLY as rdwr on server
    
    [ Upstream commit e9e62243a3e2322cf639f653a0b0a88a76446ce7 ]
    
    When we're engaged in local caching of a cifs filesystem, we cannot perform
    caching of a partially written cache granule unless we can read the rest of
    the granule.  This can result in unexpected access errors being reported to
    the user.
    
    Fix this by the following: if a file is opened O_WRONLY locally, but the
    mount was given the "-o fsc" flag, try first opening the remote file with
    GENERIC_READ|GENERIC_WRITE and if that returns -EACCES, try dropping the
    GENERIC_READ and doing the open again.  If that last succeeds, invalidate
    the cache for that file as for O_DIRECT.
    
    Fixes: 70431bfd825d ("cifs: Support fscache indexing rewrite")
    Signed-off-by: David Howells <dhowells@redhat.com>
    cc: Steve French <sfrench@samba.org>
    cc: Shyam Prasad N <nspmangalore@gmail.com>
    cc: Rohith Surabattula <rohiths.msft@gmail.com>
    cc: Jeff Layton <jlayton@kernel.org>
    cc: linux-cifs@vger.kernel.org
    cc: netfs@lists.linux.dev
    cc: linux-fsdevel@vger.kernel.org
    Signed-off-by: Steve French <stfrench@microsoft.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

cifs: Fix duplicate fscache cookie warnings [+ + +]

Author: David Howells <dhowells@redhat.com>
Date:   Wed Mar 27 14:13:24 2024 +0000

    cifs: Fix duplicate fscache cookie warnings
    
    [ Upstream commit 8876a37277cb832e1861c35f8c661825179f73f5 ]
    
    fscache emits a lot of duplicate cookie warnings with cifs because the
    index key for the fscache cookies does not include everything that the
    cifs_find_inode() function does.  The latter is used with iget5_locked() to
    distinguish between inodes in the local inode cache.
    
    Fix this by adding the creation time and file type to the fscache cookie
    key.
    
    Additionally, add a couple of comments to note that if one is changed the
    other must be also.
    
    Signed-off-by: David Howells <dhowells@redhat.com>
    Fixes: 70431bfd825d ("cifs: Support fscache indexing rewrite")
    cc: Shyam Prasad N <nspmangalore@gmail.com>
    cc: Rohith Surabattula <rohiths.msft@gmail.com>
    cc: Jeff Layton <jlayton@kernel.org>
    cc: linux-cifs@vger.kernel.org
    cc: netfs@lists.linux.dev
    cc: linux-fsdevel@vger.kernel.org
    Signed-off-by: Steve French <stfrench@microsoft.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

dm integrity: fix out-of-range warning [+ + +]

Author: Arnd Bergmann <arnd@arndb.de>
Date:   Thu Mar 28 15:30:39 2024 +0100

    dm integrity: fix out-of-range warning
    
    [ Upstream commit 8e91c2342351e0f5ef6c0a704384a7f6fc70c3b2 ]
    
    Depending on the value of CONFIG_HZ, clang complains about a pointless
    comparison:
    
    drivers/md/dm-integrity.c:4085:12: error: result of comparison of
                            constant 42949672950 with expression of type
                            'unsigned int' is always false
                            [-Werror,-Wtautological-constant-out-of-range-compare]
                            if (val >= (uint64_t)UINT_MAX * 1000 / HZ) {
    
    As the check remains useful for other configurations, shut up the
    warning by adding a second type cast to uint64_t.
    
    Fixes: 468dfca38b1a ("dm integrity: add a bitmap mode")
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>
    Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
    Reviewed-by: Justin Stitt <justinstitt@google.com>
    Signed-off-by: Mike Snitzer <snitzer@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

dma-buf: Fix NULL pointer dereference in sanitycheck() [+ + +]

Author: Pavel Sakharov <p.sakharov@ispras.ru>
Date:   Wed Mar 20 04:15:23 2024 +0500

    dma-buf: Fix NULL pointer dereference in sanitycheck()
    
    [ Upstream commit 2295bd846765c766701e666ed2e4b35396be25e6 ]
    
    If due to a memory allocation failure mock_chain() returns NULL, it is
    passed to dma_fence_enable_sw_signaling() resulting in NULL pointer
    dereference there.
    
    Call dma_fence_enable_sw_signaling() only if mock_chain() succeeds.
    
    Found by Linux Verification Center (linuxtesting.org) with SVACE.
    
    Fixes: d62c43a953ce ("dma-buf: Enable signaling on fence for selftests")
    Signed-off-by: Pavel Sakharov <p.sakharov@ispras.ru>
    Reviewed-by: Christian Kц╤nig <christian.koenig@amd.com>
    Signed-off-by: Christian Kц╤nig <christian.koenig@amd.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20240319231527.1821372-1-p.sakharov@ispras.ru
    Signed-off-by: Sasha Levin <sashal@kernel.org>

driver core: Introduce device_link_wait_removal() [+ + +]

Author: Herve Codina <herve.codina@bootlin.com>
Date:   Mon Mar 25 16:21:25 2024 +0100

    driver core: Introduce device_link_wait_removal()
    
    commit 0462c56c290a99a7f03e817ae5b843116dfb575c upstream.
    
    The commit 80dd33cf72d1 ("drivers: base: Fix device link removal")
    introduces a workqueue to release the consumer and supplier devices used
    in the devlink.
    In the job queued, devices are release and in turn, when all the
    references to these devices are dropped, the release function of the
    device itself is called.
    
    Nothing is present to provide some synchronisation with this workqueue
    in order to ensure that all ongoing releasing operations are done and
    so, some other operations can be started safely.
    
    For instance, in the following sequence:
      1) of_platform_depopulate()
      2) of_overlay_remove()
    
    During the step 1, devices are released and related devlinks are removed
    (jobs pushed in the workqueue).
    During the step 2, OF nodes are destroyed but, without any
    synchronisation with devlink removal jobs, of_overlay_remove() can raise
    warnings related to missing of_node_put():
      ERROR: memory leak, expected refcount 1 instead of 2
    
    Indeed, the missing of_node_put() call is going to be done, too late,
    from the workqueue job execution.
    
    Introduce device_link_wait_removal() to offer a way to synchronize
    operations waiting for the end of devlink removals (i.e. end of
    workqueue jobs).
    Also, as a flushing operation is done on the workqueue, the workqueue
    used is moved from a system-wide workqueue to a local one.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Herve Codina <herve.codina@bootlin.com>
    Tested-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
    Reviewed-by: Nuno Sa <nuno.sa@analog.com>
    Reviewed-by: Saravana Kannan <saravanak@google.com>
    Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Link: https://lore.kernel.org/r/20240325152140.198219-2-herve.codina@bootlin.com
    Signed-off-by: Rob Herring <robh@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drivers/perf: riscv: Disable PERF_SAMPLE_BRANCH_* while not supported [+ + +]

Author: Pu Lehui <pulehui@huawei.com>
Date:   Tue Mar 12 01:20:53 2024 +0000

    drivers/perf: riscv: Disable PERF_SAMPLE_BRANCH_* while not supported
    
    [ Upstream commit ea6873118493019474abbf57d5a800da365734df ]
    
    RISC-V perf driver does not yet support branch sampling. Although the
    specification is in the works [0], it is best to disable such events
    until support is available, otherwise we will get unexpected results.
    Due to this reason, two riscv bpf testcases get_branch_snapshot and
    perf_branches/perf_branches_hw fail.
    
    Link: https://github.com/riscv/riscv-control-transfer-records [0]
    Fixes: f5bfa23f576f ("RISC-V: Add a perf core library for pmu drivers")
    Signed-off-by: Pu Lehui <pulehui@huawei.com>
    Reviewed-by: Atish Patra <atishp@rivosinc.com>
    Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
    Link: https://lore.kernel.org/r/20240312012053.1178140-1-pulehui@huaweicloud.com
    Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drivers: net: convert to boolean for the mac_managed_pm flag [+ + +]

Author: Denis Kirjanov <dkirjanov@suse.de>
Date:   Thu Oct 27 21:45:02 2022 +0300

    drivers: net: convert to boolean for the mac_managed_pm flag
    
    [ Upstream commit eca485d22165695587bed02d8b9d0f7f44246c4a ]
    
    Signed-off-by: Dennis Kirjanov <dkirjanov@suse.de>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Stable-dep-of: cbc17e7802f5 ("net: fec: Set mac_managed_pm during probe")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/amd: Add concept of running prepare_suspend() sequence for IP blocks [+ + +]

Author: Mario Limonciello <mario.limonciello@amd.com>
Date:   Fri Oct 6 13:50:21 2023 -0500

    drm/amd: Add concept of running prepare_suspend() sequence for IP blocks
    
    [ Upstream commit cb11ca3233aa3303dc11dca25977d2e7f24be00f ]
    
    If any IP blocks allocate memory during their hw_fini() sequence
    this can cause the suspend to fail under memory pressure.  Introduce
    a new phase that IP blocks can use to allocate memory before suspend
    starts so that it can potentially be evicted into swap instead.
    
    Reviewed-by: Christian Kц╤nig <christian.koenig@amd.com>
    Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Stable-dep-of: ca299b4512d4 ("drm/amd: Flush GFXOFF requests in prepare stage")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/amd: Evict resources during PM ops prepare() callback [+ + +]

Author: Mario Limonciello <mario.limonciello@amd.com>
Date:   Fri Oct 6 13:50:20 2023 -0500

    drm/amd: Evict resources during PM ops prepare() callback
    
    [ Upstream commit 5095d5418193eb2748c7d8553c7150b8f1c44696 ]
    
    Linux PM core has a prepare() callback run before suspend.
    
    If the system is under high memory pressure, the resources may need
    to be evicted into swap instead.  If the storage backing for swap
    is offlined during the suspend() step then such a call may fail.
    
    So move this step into prepare() to move evict majority of
    resources and update all non-pmops callers to call the same callback.
    
    Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2362
    Reviewed-by: Christian Kц╤nig <christian.koenig@amd.com>
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Stable-dep-of: ca299b4512d4 ("drm/amd: Flush GFXOFF requests in prepare stage")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/amd: Flush GFXOFF requests in prepare stage [+ + +]

Author: Mario Limonciello <mario.limonciello@amd.com>
Date:   Wed Mar 20 13:32:21 2024 -0500

    drm/amd: Flush GFXOFF requests in prepare stage
    
    [ Upstream commit ca299b4512d4b4f516732a48ce9aa19d91f4473e ]
    
    If the system hasn't entered GFXOFF when suspend starts it can cause
    hangs accessing GC and RLC during the suspend stage.
    
    Cc: <stable@vger.kernel.org> # 6.1.y: 5095d5418193 ("drm/amd: Evict resources during PM ops prepare() callback")
    Cc: <stable@vger.kernel.org> # 6.1.y: cb11ca3233aa ("drm/amd: Add concept of running prepare_suspend() sequence for IP blocks")
    Cc: <stable@vger.kernel.org> # 6.1.y: 2ceec37b0e3d ("drm/amd: Add missing kernel doc for prepare_suspend()")
    Cc: <stable@vger.kernel.org> # 6.1.y: 3a9626c816db ("drm/amd: Stop evicting resources on APUs in suspend")
    Cc: <stable@vger.kernel.org> # 6.6.y: 5095d5418193 ("drm/amd: Evict resources during PM ops prepare() callback")
    Cc: <stable@vger.kernel.org> # 6.6.y: cb11ca3233aa ("drm/amd: Add concept of running prepare_suspend() sequence for IP blocks")
    Cc: <stable@vger.kernel.org> # 6.6.y: 2ceec37b0e3d ("drm/amd: Add missing kernel doc for prepare_suspend()")
    Cc: <stable@vger.kernel.org> # 6.6.y: 3a9626c816db ("drm/amd: Stop evicting resources on APUs in suspend")
    Cc: <stable@vger.kernel.org> # 6.1+
    Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3132
    Fixes: ab4750332dbe ("drm/amdgpu/sdma5.2: add begin/end_use ring callbacks")
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/panfrost: fix power transition timeout warnings [+ + +]

Author: Christian Hewitt <christianshewitt@gmail.com>
Date:   Fri Mar 22 16:45:25 2024 +0000

    drm/panfrost: fix power transition timeout warnings
    
    [ Upstream commit 2bd02f5a0bac4bb13e0da18652dc75ba0e4958ec ]
    
    Increase the timeout value to prevent system logs on Amlogic boards flooding
    with power transition warnings:
    
    [   13.047638] panfrost ffe40000.gpu: shader power transition timeout
    [   13.048674] panfrost ffe40000.gpu: l2 power transition timeout
    [   13.937324] panfrost ffe40000.gpu: shader power transition timeout
    [   13.938351] panfrost ffe40000.gpu: l2 power transition timeout
    ...
    [39829.506904] panfrost ffe40000.gpu: shader power transition timeout
    [39829.507938] panfrost ffe40000.gpu: l2 power transition timeout
    [39949.508369] panfrost ffe40000.gpu: shader power transition timeout
    [39949.509405] panfrost ffe40000.gpu: l2 power transition timeout
    
    The 2000 value has been found through trial and error testing with devices
    using G52 and G31 GPUs.
    
    Fixes: 22aa1a209018 ("drm/panfrost: Really power off GPU cores in panfrost_gpu_power_off()")
    Signed-off-by: Christian Hewitt <christianshewitt@gmail.com>
    Reviewed-by: Steven Price <steven.price@arm.com>
    Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
    Signed-off-by: Steven Price <steven.price@arm.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20240322164525.2617508-1-christianshewitt@gmail.com
    Signed-off-by: Sasha Levin <sashal@kernel.org>

erspan: make sure erspan_base_hdr is present in skb->head [+ + +]

Author: Eric Dumazet <edumazet@google.com>
Date:   Thu Mar 28 11:22:48 2024 +0000

    erspan: make sure erspan_base_hdr is present in skb->head
    
    commit 17af420545a750f763025149fa7b833a4fc8b8f0 upstream.
    
    syzbot reported a problem in ip6erspan_rcv() [1]
    
    Issue is that ip6erspan_rcv() (and erspan_rcv()) no longer make
    sure erspan_base_hdr is present in skb linear part (skb->head)
    before getting @ver field from it.
    
    Add the missing pskb_may_pull() calls.
    
    v2: Reload iph pointer in erspan_rcv() after pskb_may_pull()
        because skb->head might have changed.
    
    [1]
    
     BUG: KMSAN: uninit-value in pskb_may_pull_reason include/linux/skbuff.h:2742 [inline]
     BUG: KMSAN: uninit-value in pskb_may_pull include/linux/skbuff.h:2756 [inline]
     BUG: KMSAN: uninit-value in ip6erspan_rcv net/ipv6/ip6_gre.c:541 [inline]
     BUG: KMSAN: uninit-value in gre_rcv+0x11f8/0x1930 net/ipv6/ip6_gre.c:610
      pskb_may_pull_reason include/linux/skbuff.h:2742 [inline]
      pskb_may_pull include/linux/skbuff.h:2756 [inline]
      ip6erspan_rcv net/ipv6/ip6_gre.c:541 [inline]
      gre_rcv+0x11f8/0x1930 net/ipv6/ip6_gre.c:610
      ip6_protocol_deliver_rcu+0x1d4c/0x2ca0 net/ipv6/ip6_input.c:438
      ip6_input_finish net/ipv6/ip6_input.c:483 [inline]
      NF_HOOK include/linux/netfilter.h:314 [inline]
      ip6_input+0x15d/0x430 net/ipv6/ip6_input.c:492
      ip6_mc_input+0xa7e/0xc80 net/ipv6/ip6_input.c:586
      dst_input include/net/dst.h:460 [inline]
      ip6_rcv_finish+0x955/0x970 net/ipv6/ip6_input.c:79
      NF_HOOK include/linux/netfilter.h:314 [inline]
      ipv6_rcv+0xde/0x390 net/ipv6/ip6_input.c:310
      __netif_receive_skb_one_core net/core/dev.c:5538 [inline]
      __netif_receive_skb+0x1da/0xa00 net/core/dev.c:5652
      netif_receive_skb_internal net/core/dev.c:5738 [inline]
      netif_receive_skb+0x58/0x660 net/core/dev.c:5798
      tun_rx_batched+0x3ee/0x980 drivers/net/tun.c:1549
      tun_get_user+0x5566/0x69e0 drivers/net/tun.c:2002
      tun_chr_write_iter+0x3af/0x5d0 drivers/net/tun.c:2048
      call_write_iter include/linux/fs.h:2108 [inline]
      new_sync_write fs/read_write.c:497 [inline]
      vfs_write+0xb63/0x1520 fs/read_write.c:590
      ksys_write+0x20f/0x4c0 fs/read_write.c:643
      __do_sys_write fs/read_write.c:655 [inline]
      __se_sys_write fs/read_write.c:652 [inline]
      __x64_sys_write+0x93/0xe0 fs/read_write.c:652
     do_syscall_64+0xd5/0x1f0
     entry_SYSCALL_64_after_hwframe+0x6d/0x75
    
    Uninit was created at:
      slab_post_alloc_hook mm/slub.c:3804 [inline]
      slab_alloc_node mm/slub.c:3845 [inline]
      kmem_cache_alloc_node+0x613/0xc50 mm/slub.c:3888
      kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:577
      __alloc_skb+0x35b/0x7a0 net/core/skbuff.c:668
      alloc_skb include/linux/skbuff.h:1318 [inline]
      alloc_skb_with_frags+0xc8/0xbf0 net/core/skbuff.c:6504
      sock_alloc_send_pskb+0xa81/0xbf0 net/core/sock.c:2795
      tun_alloc_skb drivers/net/tun.c:1525 [inline]
      tun_get_user+0x209a/0x69e0 drivers/net/tun.c:1846
      tun_chr_write_iter+0x3af/0x5d0 drivers/net/tun.c:2048
      call_write_iter include/linux/fs.h:2108 [inline]
      new_sync_write fs/read_write.c:497 [inline]
      vfs_write+0xb63/0x1520 fs/read_write.c:590
      ksys_write+0x20f/0x4c0 fs/read_write.c:643
      __do_sys_write fs/read_write.c:655 [inline]
      __se_sys_write fs/read_write.c:652 [inline]
      __x64_sys_write+0x93/0xe0 fs/read_write.c:652
     do_syscall_64+0xd5/0x1f0
     entry_SYSCALL_64_after_hwframe+0x6d/0x75
    
    CPU: 1 PID: 5045 Comm: syz-executor114 Not tainted 6.9.0-rc1-syzkaller-00021-g962490525cff #0
    
    Fixes: cb73ee40b1b3 ("net: ip_gre: use erspan key field for tunnel lookup")
    Reported-by: syzbot+1c1cf138518bf0c53d68@syzkaller.appspotmail.com
    Closes: https://lore.kernel.org/netdev/000000000000772f2c0614b66ef7@google.com/
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Cc: Lorenzo Bianconi <lorenzo@kernel.org>
    Link: https://lore.kernel.org/r/20240328112248.1101491-1-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fs/pipe: Fix lockdep false-positive in watchqueue pipe_write() [+ + +]

Author: Jann Horn <jannh@google.com>
Date:   Fri Nov 24 16:08:22 2023 +0100

    fs/pipe: Fix lockdep false-positive in watchqueue pipe_write()
    
    [ Upstream commit 055ca83559912f2cfd91c9441427bac4caf3c74e ]
    
    When you try to splice between a normal pipe and a notification pipe,
    get_pipe_info(..., true) fails, so splice() falls back to treating the
    notification pipe like a normal pipe - so we end up in
    iter_file_splice_write(), which first locks the input pipe, then calls
    vfs_iter_write(), which locks the output pipe.
    
    Lockdep complains about that, because we're taking a pipe lock while
    already holding another pipe lock.
    
    I think this probably (?) can't actually lead to deadlocks, since you'd
    need another way to nest locking a normal pipe into locking a
    watch_queue pipe, but the lockdep annotations don't make that clear.
    
    Bail out earlier in pipe_write() for notification pipes, before taking
    the pipe lock.
    
    Reported-and-tested-by: <syzbot+011e4ea1da6692cf881c@syzkaller.appspotmail.com>
    Closes: https://syzkaller.appspot.com/bug?extid=011e4ea1da6692cf881c
    Fixes: c73be61cede5 ("pipe: Add general notification queue support")
    Signed-off-by: Jann Horn <jannh@google.com>
    Link: https://lore.kernel.org/r/20231124150822.2121798-1-jannh@google.com
    Signed-off-by: Christian Brauner <brauner@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

gro: fix ownership transfer [+ + +]

Author: Antoine Tenart <atenart@kernel.org>
Date:   Tue Mar 26 12:33:59 2024 +0100

    gro: fix ownership transfer
    
    commit ed4cccef64c1d0d5b91e69f7a8a6697c3a865486 upstream.
    
    If packets are GROed with fraglist they might be segmented later on and
    continue their journey in the stack. In skb_segment_list those skbs can
    be reused as-is. This is an issue as their destructor was removed in
    skb_gro_receive_list but not the reference to their socket, and then
    they can't be orphaned. Fix this by also removing the reference to the
    socket.
    
    For example this could be observed,
    
      kernel BUG at include/linux/skbuff.h:3131!  (skb_orphan)
      RIP: 0010:ip6_rcv_core+0x11bc/0x19a0
      Call Trace:
       ipv6_list_rcv+0x250/0x3f0
       __netif_receive_skb_list_core+0x49d/0x8f0
       netif_receive_skb_list_internal+0x634/0xd40
       napi_complete_done+0x1d2/0x7d0
       gro_cell_poll+0x118/0x1f0
    
    A similar construction is found in skb_gro_receive, apply the same
    change there.
    
    Fixes: 5e10da5385d2 ("skbuff: allow 'slow_gro' for skb carring sock reference")
    Signed-off-by: Antoine Tenart <atenart@kernel.org>
    Reviewed-by: Willem de Bruijn <willemb@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

i40e: Enforce software interrupt during busy-poll exit [+ + +]

Author: Ivan Vecera <ivecera@redhat.com>
Date:   Sat Mar 16 12:38:29 2024 +0100

    i40e: Enforce software interrupt during busy-poll exit
    
    [ Upstream commit ea558de7238bb12c3435c47f0631e9d17bf4a09f ]
    
    As for ice bug fixed by commit b7306b42beaf ("ice: manage interrupts
    during poll exit") followed by commit 23be7075b318 ("ice: fix software
    generating extra interrupts") I'm seeing the similar issue also with
    i40e driver.
    
    In certain situation when busy-loop is enabled together with adaptive
    coalescing, the driver occasionally misses that there are outstanding
    descriptors to clean when exiting busy poll.
    
    Try to catch the remaining work by triggering a software interrupt
    when exiting busy poll. No extra interrupts will be generated when
    busy polling is not used.
    
    The issue was found when running sockperf ping-pong tcp test with
    adaptive coalescing and busy poll enabled (50 as value busy_pool
    and busy_read sysctl knobs) and results in huge latency spikes
    with more than 100000us.
    
    The fix is inspired from the ice driver and do the following:
    1) During napi poll exit in case of busy-poll (napo_complete_done()
       returns false) this is recorded to q_vector that we were in busy
       loop.
    2) Extends i40e_buildreg_itr() to be able to add an enforced software
       interrupt into built value
    2) In i40e_update_enable_itr() enforces a software interrupt trigger
       if we are exiting busy poll to catch any pending clean-ups
    3) Reuses unused 3rd ITR (interrupt throttle) index and set it to
       20K interrupts per second to limit the number of these sw interrupts.
    
    Test results
    ============
    Prior:
    [root@dell-per640-07 net]# sockperf ping-pong -i 10.9.9.1 --tcp -m 1000 --mps=max -t 120
    sockperf: == version #3.10-no.git ==
    sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)
    
    [ 0] IP = 10.9.9.1        PORT = 11111 # TCP
    sockperf: Warmup stage (sending a few dummy messages)...
    sockperf: Starting test...
    sockperf: Test end (interrupted by timer)
    sockperf: Test ended
    sockperf: [Total Run] RunTime=119.999 sec; Warm up time=400 msec; SentMessages=2438563; ReceivedMessages=2438562
    sockperf: ========= Printing statistics for Server No: 0
    sockperf: [Valid Duration] RunTime=119.549 sec; SentMessages=2429473; ReceivedMessages=2429473
    sockperf: ====> avg-latency=24.571 (std-dev=93.297, mean-ad=4.904, median-ad=1.510, siqr=1.063, cv=3.797, std-error=0.060, 99.0% ci=[24.417, 24.725])
    sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
    sockperf: Summary: Latency is 24.571 usec
    sockperf: Total 2429473 observations; each percentile contains 24294.73 observations
    sockperf: ---> <MAX> observation = 103294.331
    sockperf: ---> percentile 99.999 =   45.633
    sockperf: ---> percentile 99.990 =   37.013
    sockperf: ---> percentile 99.900 =   35.910
    sockperf: ---> percentile 99.000 =   33.390
    sockperf: ---> percentile 90.000 =   28.626
    sockperf: ---> percentile 75.000 =   27.741
    sockperf: ---> percentile 50.000 =   26.743
    sockperf: ---> percentile 25.000 =   25.614
    sockperf: ---> <MIN> observation =   12.220
    
    After:
    [root@dell-per640-07 net]# sockperf ping-pong -i 10.9.9.1 --tcp -m 1000 --mps=max -t 120
    sockperf: == version #3.10-no.git ==
    sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)
    
    [ 0] IP = 10.9.9.1        PORT = 11111 # TCP
    sockperf: Warmup stage (sending a few dummy messages)...
    sockperf: Starting test...
    sockperf: Test end (interrupted by timer)
    sockperf: Test ended
    sockperf: [Total Run] RunTime=119.999 sec; Warm up time=400 msec; SentMessages=2400055; ReceivedMessages=2400054
    sockperf: ========= Printing statistics for Server No: 0
    sockperf: [Valid Duration] RunTime=119.549 sec; SentMessages=2391186; ReceivedMessages=2391186
    sockperf: ====> avg-latency=24.965 (std-dev=5.934, mean-ad=4.642, median-ad=1.485, siqr=1.067, cv=0.238, std-error=0.004, 99.0% ci=[24.955, 24.975])
    sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
    sockperf: Summary: Latency is 24.965 usec
    sockperf: Total 2391186 observations; each percentile contains 23911.86 observations
    sockperf: ---> <MAX> observation =  195.841
    sockperf: ---> percentile 99.999 =   45.026
    sockperf: ---> percentile 99.990 =   39.009
    sockperf: ---> percentile 99.900 =   35.922
    sockperf: ---> percentile 99.000 =   33.482
    sockperf: ---> percentile 90.000 =   28.902
    sockperf: ---> percentile 75.000 =   27.821
    sockperf: ---> percentile 50.000 =   26.860
    sockperf: ---> percentile 25.000 =   25.685
    sockperf: ---> <MIN> observation =   12.277
    
    Fixes: 0bcd952feec7 ("ethernet/intel: consolidate NAPI and NAPI exit")
    Reported-by: Hugo Ferreira <hferreir@redhat.com>
    Reviewed-by: Michal Schmidt <mschmidt@redhat.com>
    Signed-off-by: Ivan Vecera <ivecera@redhat.com>
    Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
    Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
    Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

i40e: fix i40e_count_filters() to count only active/new filters [+ + +]

Author: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Date:   Wed Mar 13 10:44:00 2024 +0100

    i40e: fix i40e_count_filters() to count only active/new filters
    
    commit eb58c598ce45b7e787568fe27016260417c3d807 upstream.
    
    The bug usually affects untrusted VFs, because they are limited to 18 MACs,
    it affects them badly, not letting to create MAC all filters.
    Not stable to reproduce, it happens when VF user creates MAC filters
    when other MACVLAN operations are happened in parallel.
    But consequence is that VF can't receive desired traffic.
    
    Fix counter to be bumped only for new or active filters.
    
    Fixes: 621650cabee5 ("i40e: Refactoring VF MAC filters counting to make more reliable")
    Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
    Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
    Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
    Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
    Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

i40e: Fix VF MAC filter removal [+ + +]

Author: Ivan Vecera <ivecera@redhat.com>
Date:   Fri Mar 29 11:06:37 2024 -0700

    i40e: Fix VF MAC filter removal
    
    commit ea2a1cfc3b2019bdea6324acd3c03606b60d71ad upstream.
    
    Commit 73d9629e1c8c ("i40e: Do not allow untrusted VF to remove
    administratively set MAC") fixed an issue where untrusted VF was
    allowed to remove its own MAC address although this was assigned
    administratively from PF. Unfortunately the introduced check
    is wrong because it causes that MAC filters for other MAC addresses
    including multi-cast ones are not removed.
    
    <snip>
            if (ether_addr_equal(addr, vf->default_lan_addr.addr) &&
                i40e_can_vf_change_mac(vf))
                    was_unimac_deleted = true;
            else
                    continue;
    
            if (i40e_del_mac_filter(vsi, al->list[i].addr)) {
            ...
    </snip>
    
    The else path with `continue` effectively skips any MAC filter
    removal except one for primary MAC addr when VF is allowed to do so.
    Fix the check condition so the `continue` is only done for primary
    MAC address.
    
    Fixes: 73d9629e1c8c ("i40e: Do not allow untrusted VF to remove administratively set MAC")
    Signed-off-by: Ivan Vecera <ivecera@redhat.com>
    Reviewed-by: Michal Schmidt <mschmidt@redhat.com>
    Reviewed-by: Brett Creeley <brett.creeley@amd.com>
    Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
    Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
    Link: https://lore.kernel.org/r/20240329180638.211412-1-anthony.l.nguyen@intel.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

i40e: fix vf may be used uninitialized in this function warning [+ + +]

Author: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Date:   Wed Mar 13 10:56:39 2024 +0100

    i40e: fix vf may be used uninitialized in this function warning
    
    commit f37c4eac99c258111d414d31b740437e1925b8e8 upstream.
    
    To fix the regression introduced by commit 52424f974bc5, which causes
    servers hang in very hard to reproduce conditions with resets races.
    Using two sources for the information is the root cause.
    In this function before the fix bumping v didn't mean bumping vf
    pointer. But the code used this variables interchangeably, so stale vf
    could point to different/not intended vf.
    
    Remove redundant "v" variable and iterate via single VF pointer across
    whole function instead to guarantee VF pointer validity.
    
    Fixes: 52424f974bc5 ("i40e: Fix VF hang when reset is triggered on another VF")
    Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
    Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
    Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
    Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
    Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
    Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

i40e: Remove _t suffix from enum type names [+ + +]

Author: Ivan Vecera <ivecera@redhat.com>
Date:   Mon Nov 13 15:10:24 2023 -0800

    i40e: Remove _t suffix from enum type names
    
    [ Upstream commit addca9175e5f74cf29e8ad918c38c09b8663b5b8 ]
    
    Enum type names should not be suffixed by '_t'. Either to use
    'typedef enum name name_t' to so plain 'name_t var' instead of
    'enum name_t var'.
    
    Signed-off-by: Ivan Vecera <ivecera@redhat.com>
    Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
    Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
    Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
    Link: https://lore.kernel.org/r/20231113231047.548659-6-anthony.l.nguyen@intel.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Stable-dep-of: ea558de7238b ("i40e: Enforce software interrupt during busy-poll exit")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

i40e: Store the irq number in i40e_q_vector [+ + +]

Author: Joe Damato <jdamato@fastly.com>
Date:   Fri Oct 7 14:38:40 2022 -0700

    i40e: Store the irq number in i40e_q_vector
    
    [ Upstream commit 6b85a4f39ff7177b2428d4deab1151a31754e391 ]
    
    Make it easy to figure out the IRQ number for a particular i40e_q_vector by
    storing the assigned IRQ in the structure itself.
    
    Signed-off-by: Joe Damato <jdamato@fastly.com>
    Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
    Acked-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
    Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
    Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
    Stable-dep-of: ea558de7238b ("i40e: Enforce software interrupt during busy-poll exit")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

inet: inet_defrag: prevent sk release while still in use [+ + +]

Author: Florian Westphal <fw@strlen.de>
Date:   Tue Mar 26 11:18:41 2024 +0100

    inet: inet_defrag: prevent sk release while still in use
    
    [ Upstream commit 18685451fc4e546fc0e718580d32df3c0e5c8272 ]
    
    ip_local_out() and other functions can pass skb->sk as function argument.
    
    If the skb is a fragment and reassembly happens before such function call
    returns, the sk must not be released.
    
    This affects skb fragments reassembled via netfilter or similar
    modules, e.g. openvswitch or ct_act.c, when run as part of tx pipeline.
    
    Eric Dumazet made an initial analysis of this bug.  Quoting Eric:
      Calling ip_defrag() in output path is also implying skb_orphan(),
      which is buggy because output path relies on sk not disappearing.
    
      A relevant old patch about the issue was :
      8282f27449bf ("inet: frag: Always orphan skbs inside ip_defrag()")
    
      [..]
    
      net/ipv4/ip_output.c depends on skb->sk being set, and probably to an
      inet socket, not an arbitrary one.
    
      If we orphan the packet in ipvlan, then downstream things like FQ
      packet scheduler will not work properly.
    
      We need to change ip_defrag() to only use skb_orphan() when really
      needed, ie whenever frag_list is going to be used.
    
    Eric suggested to stash sk in fragment queue and made an initial patch.
    However there is a problem with this:
    
    If skb is refragmented again right after, ip_do_fragment() will copy
    head->sk to the new fragments, and sets up destructor to sock_wfree.
    IOW, we have no choice but to fix up sk_wmem accouting to reflect the
    fully reassembled skb, else wmem will underflow.
    
    This change moves the orphan down into the core, to last possible moment.
    As ip_defrag_offset is aliased with sk_buff->sk member, we must move the
    offset into the FRAG_CB, else skb->sk gets clobbered.
    
    This allows to delay the orphaning long enough to learn if the skb has
    to be queued or if the skb is completing the reasm queue.
    
    In the former case, things work as before, skb is orphaned.  This is
    safe because skb gets queued/stolen and won't continue past reasm engine.
    
    In the latter case, we will steal the skb->sk reference, reattach it to
    the head skb, and fix up wmem accouting when inet_frag inflates truesize.
    
    Fixes: 7026b1ddb6b8 ("netfilter: Pass socket pointer down through okfn().")
    Diagnosed-by: Eric Dumazet <edumazet@google.com>
    Reported-by: xingwei lee <xrivendell7@gmail.com>
    Reported-by: yue sun <samsun1006219@gmail.com>
    Reported-by: syzbot+e5167d7144a62715044c@syzkaller.appspotmail.com
    Signed-off-by: Florian Westphal <fw@strlen.de>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Link: https://lore.kernel.org/r/20240326101845.30836-1-fw@strlen.de
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ipv6: Fix infinite recursion in fib6_dump_done(). [+ + +]

Author: Kuniyuki Iwashima <kuniyu@amazon.com>
Date:   Mon Apr 1 14:10:04 2024 -0700

    ipv6: Fix infinite recursion in fib6_dump_done().
    
    commit d21d40605bca7bd5fc23ef03d4c1ca1f48bc2cae upstream.
    
    syzkaller reported infinite recursive calls of fib6_dump_done() during
    netlink socket destruction.  [1]
    
    From the log, syzkaller sent an AF_UNSPEC RTM_GETROUTE message, and then
    the response was generated.  The following recvmmsg() resumed the dump
    for IPv6, but the first call of inet6_dump_fib() failed at kzalloc() due
    to the fault injection.  [0]
    
      12:01:34 executing program 3:
      r0 = socket$nl_route(0x10, 0x3, 0x0)
      sendmsg$nl_route(r0, ... snip ...)
      recvmmsg(r0, ... snip ...) (fail_nth: 8)
    
    Here, fib6_dump_done() was set to nlk_sk(sk)->cb.done, and the next call
    of inet6_dump_fib() set it to nlk_sk(sk)->cb.args[3].  syzkaller stopped
    receiving the response halfway through, and finally netlink_sock_destruct()
    called nlk_sk(sk)->cb.done().
    
    fib6_dump_done() calls fib6_dump_end() and nlk_sk(sk)->cb.done() if it
    is still not NULL.  fib6_dump_end() rewrites nlk_sk(sk)->cb.done() by
    nlk_sk(sk)->cb.args[3], but it has the same function, not NULL, calling
    itself recursively and hitting the stack guard page.
    
    To avoid the issue, let's set the destructor after kzalloc().
    
    [0]:
    FAULT_INJECTION: forcing a failure.
    name failslab, interval 1, probability 0, space 0, times 0
    CPU: 1 PID: 432110 Comm: syz-executor.3 Not tainted 6.8.0-12821-g537c2e91d354-dirty #11
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
    Call Trace:
     <TASK>
     dump_stack_lvl (lib/dump_stack.c:117)
     should_fail_ex (lib/fault-inject.c:52 lib/fault-inject.c:153)
     should_failslab (mm/slub.c:3733)
     kmalloc_trace (mm/slub.c:3748 mm/slub.c:3827 mm/slub.c:3992)
     inet6_dump_fib (./include/linux/slab.h:628 ./include/linux/slab.h:749 net/ipv6/ip6_fib.c:662)
     rtnl_dump_all (net/core/rtnetlink.c:4029)
     netlink_dump (net/netlink/af_netlink.c:2269)
     netlink_recvmsg (net/netlink/af_netlink.c:1988)
     ____sys_recvmsg (net/socket.c:1046 net/socket.c:2801)
     ___sys_recvmsg (net/socket.c:2846)
     do_recvmmsg (net/socket.c:2943)
     __x64_sys_recvmmsg (net/socket.c:3041 net/socket.c:3034 net/socket.c:3034)
    
    [1]:
    BUG: TASK stack guard page was hit at 00000000f2fa9af1 (stack is 00000000b7912430..000000009a436beb)
    stack guard page: 0000 [#1] PREEMPT SMP KASAN
    CPU: 1 PID: 223719 Comm: kworker/1:3 Not tainted 6.8.0-12821-g537c2e91d354-dirty #11
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
    Workqueue: events netlink_sock_destruct_work
    RIP: 0010:fib6_dump_done (net/ipv6/ip6_fib.c:570)
    Code: 3c 24 e8 f3 e9 51 fd e9 28 fd ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 41 57 41 56 41 55 41 54 55 48 89 fd <53> 48 8d 5d 60 e8 b6 4d 07 fd 48 89 da 48 b8 00 00 00 00 00 fc ff
    RSP: 0018:ffffc9000d980000 EFLAGS: 00010293
    RAX: 0000000000000000 RBX: ffffffff84405990 RCX: ffffffff844059d3
    RDX: ffff8881028e0000 RSI: ffffffff84405ac2 RDI: ffff88810c02f358
    RBP: ffff88810c02f358 R08: 0000000000000007 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000224 R12: 0000000000000000
    R13: ffff888007c82c78 R14: ffff888007c82c68 R15: ffff888007c82c68
    FS:  0000000000000000(0000) GS:ffff88811b100000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: ffffc9000d97fff8 CR3: 0000000102309002 CR4: 0000000000770ef0
    PKRU: 55555554
    Call Trace:
     <#DF>
     </#DF>
     <TASK>
     fib6_dump_done (net/ipv6/ip6_fib.c:572 (discriminator 1))
     fib6_dump_done (net/ipv6/ip6_fib.c:572 (discriminator 1))
     ...
     fib6_dump_done (net/ipv6/ip6_fib.c:572 (discriminator 1))
     fib6_dump_done (net/ipv6/ip6_fib.c:572 (discriminator 1))
     netlink_sock_destruct (net/netlink/af_netlink.c:401)
     __sk_destruct (net/core/sock.c:2177 (discriminator 2))
     sk_destruct (net/core/sock.c:2224)
     __sk_free (net/core/sock.c:2235)
     sk_free (net/core/sock.c:2246)
     process_one_work (kernel/workqueue.c:3259)
     worker_thread (kernel/workqueue.c:3329 kernel/workqueue.c:3416)
     kthread (kernel/kthread.c:388)
     ret_from_fork (arch/x86/kernel/process.c:153)
     ret_from_fork_asm (arch/x86/entry/entry_64.S:256)
    Modules linked in:
    
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Reported-by: syzkaller <syzkaller@googlegroups.com>
    Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Link: https://lore.kernel.org/r/20240401211003.25274-1-kuniyu@amazon.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ixgbe: avoid sleeping allocation in ixgbe_ipsec_vf_add_sa() [+ + +]

Author: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Date:   Tue Mar 5 17:02:02 2024 +0100

    ixgbe: avoid sleeping allocation in ixgbe_ipsec_vf_add_sa()
    
    [ Upstream commit aec806fb4afba5fe80b09e29351379a4292baa43 ]
    
    Change kzalloc() flags used in ixgbe_ipsec_vf_add_sa() to GFP_ATOMIC, to
    avoid sleeping in IRQ context.
    
    Dan Carpenter, with the help of Smatch, has found following issue:
    The patch eda0333ac293: "ixgbe: add VF IPsec management" from Aug 13,
    2018 (linux-next), leads to the following Smatch static checker
    warning: drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c:917 ixgbe_ipsec_vf_add_sa()
            warn: sleeping in IRQ context
    
    The call tree that Smatch is worried about is:
    ixgbe_msix_other() <- IRQ handler
    -> ixgbe_msg_task()
       -> ixgbe_rcv_msg_from_vf()
          -> ixgbe_ipsec_vf_add_sa()
    
    Fixes: eda0333ac293 ("ixgbe: add VF IPsec management")
    Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
    Link: https://lore.kernel.org/intel-wired-lan/db31a0b0-4d9f-4e6b-aed8-88266eb5665c@moroto.mountain
    Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
    Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
    Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
    Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
    Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ksmbd: do not set SMB2_GLOBAL_CAP_ENCRYPTION for SMB 3.1.1 [+ + +]

Author: Namjae Jeon <linkinjeon@kernel.org>
Date:   Tue Apr 2 09:31:22 2024 +0900

    ksmbd: do not set SMB2_GLOBAL_CAP_ENCRYPTION for SMB 3.1.1
    
    commit 5ed11af19e56f0434ce0959376d136005745a936 upstream.
    
    SMB2_GLOBAL_CAP_ENCRYPTION flag should be used only for 3.0 and
    3.0.2 dialects. This flags set cause compatibility problems with
    other SMB clients.
    
    Reported-by: James Christopher Adduono <jc@adduono.com>
    Tested-by: James Christopher Adduono <jc@adduono.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
    Signed-off-by: Steve French <stfrench@microsoft.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ksmbd: don't send oplock break if rename fails [+ + +]

Author: Namjae Jeon <linkinjeon@kernel.org>
Date:   Sun Mar 31 21:58:26 2024 +0900

    ksmbd: don't send oplock break if rename fails
    
    commit c1832f67035dc04fb89e6b591b64e4d515843cda upstream.
    
    Don't send oplock break if rename fails. This patch fix
    smb2.oplock.batch20 test.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
    Signed-off-by: Steve French <stfrench@microsoft.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ksmbd: validate payload size in ipc response [+ + +]

Author: Namjae Jeon <linkinjeon@kernel.org>
Date:   Sun Mar 31 21:59:10 2024 +0900

    ksmbd: validate payload size in ipc response
    
    commit a677ebd8ca2f2632ccdecbad7b87641274e15aac upstream.
    
    If installing malicious ksmbd-tools, ksmbd.mountd can return invalid ipc
    response to ksmbd kernel server. ksmbd should validate payload size of
    ipc response from ksmbd.mountd to avoid memory overrun or
    slab-out-of-bounds. This patch validate 3 ipc response that has payload.
    
    Cc: stable@vger.kernel.org
    Reported-by: Chao Ma <machao2019@gmail.com>
    Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
    Signed-off-by: Steve French <stfrench@microsoft.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

KVM: arm64: Fix host-programmed guest events in nVHE [+ + +]

Author: Oliver Upton <oliver.upton@linux.dev>
Date:   Tue Mar 5 18:48:39 2024 +0000

    KVM: arm64: Fix host-programmed guest events in nVHE
    
    commit e89c928bedd77d181edc2df01cb6672184775140 upstream.
    
    Programming PMU events in the host that count during guest execution is
    a feature supported by perf, e.g.
    
      perf stat -e cpu_cycles:G ./lkvm run
    
    While this works for VHE, the guest/host event bitmaps are not carried
    through to the hypervisor in the nVHE configuration. Make
    kvm_pmu_update_vcpu_events() conditional on whether or not _hardware_
    supports PMUv3 rather than if the vCPU as vPMU enabled.
    
    Cc: stable@vger.kernel.org
    Fixes: 84d751a019a9 ("KVM: arm64: Pass pmu events to hyp via vcpu")
    Reviewed-by: Marc Zyngier <maz@kernel.org>
    Link: https://lore.kernel.org/r/20240305184840.636212-3-oliver.upton@linux.dev
    Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

KVM: SVM: Add support for allowing zero SEV ASIDs [+ + +]

Author: Ashish Kalra <ashish.kalra@amd.com>
Date:   Wed Jan 31 15:56:08 2024 -0800

    KVM: SVM: Add support for allowing zero SEV ASIDs
    
    [ Upstream commit 0aa6b90ef9d75b4bd7b6d106d85f2a3437697f91 ]
    
    Some BIOSes allow the end user to set the minimum SEV ASID value
    (CPUID 0x8000001F_EDX) to be greater than the maximum number of
    encrypted guests, or maximum SEV ASID value (CPUID 0x8000001F_ECX)
    in order to dedicate all the SEV ASIDs to SEV-ES or SEV-SNP.
    
    The SEV support, as coded, does not handle the case where the minimum
    SEV ASID value can be greater than the maximum SEV ASID value.
    As a result, the following confusing message is issued:
    
    [   30.715724] kvm_amd: SEV enabled (ASIDs 1007 - 1006)
    
    Fix the support to properly handle this case.
    
    Fixes: 916391a2d1dc ("KVM: SVM: Add support for SEV-ES capability in KVM")
    Suggested-by: Sean Christopherson <seanjc@google.com>
    Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
    Cc: stable@vger.kernel.org
    Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
    Link: https://lore.kernel.org/r/20240104190520.62510-1-Ashish.Kalra@amd.com
    Link: https://lore.kernel.org/r/20240131235609.4161407-4-seanjc@google.com
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

KVM: SVM: enhance info printk's in SEV init [+ + +]

Author: Alexander Mikhalitsyn <alexander@mihalicyn.com>
Date:   Mon May 22 18:12:48 2023 +0200

    KVM: SVM: enhance info printk's in SEV init
    
    [ Upstream commit 6d1bc9754b04075d938b47cf7f7800814b8911a7 ]
    
    Let's print available ASID ranges for SEV/SEV-ES guests.
    This information can be useful for system administrator
    to debug if SEV/SEV-ES fails to enable.
    
    There are a few reasons.
    SEV:
    - NPT is disabled (module parameter)
    - CPU lacks some features (sev, decodeassists)
    - Maximum SEV ASID is 0
    
    SEV-ES:
    - mmio_caching is disabled (module parameter)
    - CPU lacks sev_es feature
    - Minimum SEV ASID value is 1 (can be adjusted in BIOS/UEFI)
    
    Cc: Sean Christopherson <seanjc@google.com>
    Cc: Paolo Bonzini <pbonzini@redhat.com>
    Cc: Stц╘phane Graber <stgraber@ubuntu.com>
    Cc: kvm@vger.kernel.org
    Cc: linux-kernel@vger.kernel.org
    Suggested-by: Sean Christopherson <seanjc@google.com>
    Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
    Link: https://lore.kernel.org/r/20230522161249.800829-3-aleksandr.mikhalitsyn@canonical.com
    [sean: print '0' for min SEV-ES ASID if there are no available ASIDs]
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Stable-dep-of: 0aa6b90ef9d7 ("KVM: SVM: Add support for allowing zero SEV ASIDs")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

KVM: SVM: Use unsigned integers when dealing with ASIDs [+ + +]

Author: Sean Christopherson <seanjc@google.com>
Date:   Wed Jan 31 15:56:07 2024 -0800

    KVM: SVM: Use unsigned integers when dealing with ASIDs
    
    [ Upstream commit 466eec4a22a76c462781bf6d45cb02cbedf21a61 ]
    
    Convert all local ASID variables and parameters throughout the SEV code
    from signed integers to unsigned integers.  As ASIDs are fundamentally
    unsigned values, and the global min/max variables are appropriately
    unsigned integers, too.
    
    Functionally, this is a glorified nop as KVM guarantees min_sev_asid is
    non-zero, and no CPU supports -1u as the _only_ asid, i.e. the signed vs.
    unsigned goof won't cause problems in practice.
    
    Opportunistically use sev_get_asid() in sev_flush_encrypted_page() instead
    of open coding an equivalent.
    
    Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
    Link: https://lore.kernel.org/r/20240131235609.4161407-3-seanjc@google.com
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Stable-dep-of: 0aa6b90ef9d7 ("KVM: SVM: Add support for allowing zero SEV ASIDs")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

KVM: SVM: WARN, but continue, if misc_cg_set_capacity() fails [+ + +]

Author: Sean Christopherson <seanjc@google.com>
Date:   Tue Jun 6 17:44:49 2023 -0700

    KVM: SVM: WARN, but continue, if misc_cg_set_capacity() fails
    
    [ Upstream commit 106ed2cad9f7bd803bd31a18fe7a9219b077bf95 ]
    
    WARN and continue if misc_cg_set_capacity() fails, as the only scenario
    in which it can fail is if the specified resource is invalid, which should
    never happen when CONFIG_KVM_AMD_SEV=y.  Deliberately not bailing "fixes"
    a theoretical bug where KVM would leak the ASID bitmaps on failure, which
    again can't happen.
    
    If the impossible should happen, the end result is effectively the same
    with respect to SEV and SEV-ES (they are unusable), while continuing on
    has the advantage of letting KVM load, i.e. userspace can still run
    non-SEV guests.
    
    Reported-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
    Link: https://lore.kernel.org/r/20230607004449.1421131-1-seanjc@google.com
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Stable-dep-of: 0aa6b90ef9d7 ("KVM: SVM: Add support for allowing zero SEV ASIDs")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

KVM: x86: Add BHI_NO [+ + +]

Author: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Date:   Wed Mar 13 09:49:17 2024 -0700

    KVM: x86: Add BHI_NO
    
    Intel processors that aren't vulnerable to BHI will set
    commit ed2e8d49b54d677f3123668a21a57822d679651f upstream.
    
    MSR_IA32_ARCH_CAPABILITIES[BHI_NO] = 1;. Guests may use this BHI_NO bit to
    determine if they need to implement BHI mitigations or not.  Allow this bit
    to be passed to the guests.
    
    Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
    Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
    Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
    Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
    
    Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Linux: Linux 6.1.85 [+ + +]

Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date:   Wed Apr 10 16:28:36 2024 +0200

    Linux 6.1.85
    
    Link: https://lore.kernel.org/r/20240408125256.218368873@linuxfoundation.org
    Tested-by: SeongJae Park <sj@kernel.org>
    Tested-by: Kelsey Steele <kelseysteele@linux.microsoft.com>
    Tested-by: Salvatore Bonaccorso <carnil@debian.org>
    Tested-by: Ron Economos <re@w6rz.net>
    Tested-by: Jon Hunter <jonathanh@nvidia.com>
    Tested-by: Pavel Machek (CIP) <pavel@denx.de>
    Tested-by: Conor Dooley <conor.dooley@microchip.com>
    Tested-by: Mark Brown <broonie@kernel.org>
    Tested-by: Sven Joachim <svenjoac@gmx.de>
    Tested-by: Shuah Khan <skhan@linuxfoundation.org>
    Link: https://lore.kernel.org/r/20240409172805.638917723@linuxfoundation.org
    Tested-by: kernelci.org bot <bot@kernelci.org>
    Link: https://lore.kernel.org/r/20240409173524.517362803@linuxfoundation.org
    Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
    Tested-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mlxbf_gige: call request_irq() after NAPI initialized [+ + +]

Author: David Thompson <davthompson@nvidia.com>
Date:   Mon Mar 25 14:36:27 2024 -0400

    mlxbf_gige: call request_irq() after NAPI initialized
    
    [ Upstream commit f7442a634ac06b953fc1f7418f307b25acd4cfbc ]
    
    The mlxbf_gige driver encounters a NULL pointer exception in
    mlxbf_gige_open() when kdump is enabled.  The sequence to reproduce
    the exception is as follows:
    a) enable kdump
    b) trigger kdump via "echo c > /proc/sysrq-trigger"
    c) kdump kernel executes
    d) kdump kernel loads mlxbf_gige module
    e) the mlxbf_gige module runs its open() as the
       the "oob_net0" interface is brought up
    f) mlxbf_gige module will experience an exception
       during its open(), something like:
    
         Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
         Mem abort info:
           ESR = 0x0000000086000004
           EC = 0x21: IABT (current EL), IL = 32 bits
           SET = 0, FnV = 0
           EA = 0, S1PTW = 0
           FSC = 0x04: level 0 translation fault
         user pgtable: 4k pages, 48-bit VAs, pgdp=00000000e29a4000
         [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
         Internal error: Oops: 0000000086000004 [#1] SMP
         CPU: 0 PID: 812 Comm: NetworkManager Tainted: G           OE     5.15.0-1035-bluefield #37-Ubuntu
         Hardware name: https://www.mellanox.com BlueField-3 SmartNIC Main Card/BlueField-3 SmartNIC Main Card, BIOS 4.6.0.13024 Jan 19 2024
         pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
         pc : 0x0
         lr : __napi_poll+0x40/0x230
         sp : ffff800008003e00
         x29: ffff800008003e00 x28: 0000000000000000 x27: 00000000ffffffff
         x26: ffff000066027238 x25: ffff00007cedec00 x24: ffff800008003ec8
         x23: 000000000000012c x22: ffff800008003eb7 x21: 0000000000000000
         x20: 0000000000000001 x19: ffff000066027238 x18: 0000000000000000
         x17: ffff578fcb450000 x16: ffffa870b083c7c0 x15: 0000aaab010441d0
         x14: 0000000000000001 x13: 00726f7272655f65 x12: 6769675f6662786c
         x11: 0000000000000000 x10: 0000000000000000 x9 : ffffa870b0842398
         x8 : 0000000000000004 x7 : fe5a48b9069706ea x6 : 17fdb11fc84ae0d2
         x5 : d94a82549d594f35 x4 : 0000000000000000 x3 : 0000000000400100
         x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000066027238
         Call trace:
          0x0
          net_rx_action+0x178/0x360
          __do_softirq+0x15c/0x428
          __irq_exit_rcu+0xac/0xec
          irq_exit+0x18/0x2c
          handle_domain_irq+0x6c/0xa0
          gic_handle_irq+0xec/0x1b0
          call_on_irq_stack+0x20/0x2c
          do_interrupt_handler+0x5c/0x70
          el1_interrupt+0x30/0x50
          el1h_64_irq_handler+0x18/0x2c
          el1h_64_irq+0x7c/0x80
          __setup_irq+0x4c0/0x950
          request_threaded_irq+0xf4/0x1bc
          mlxbf_gige_request_irqs+0x68/0x110 [mlxbf_gige]
          mlxbf_gige_open+0x5c/0x170 [mlxbf_gige]
          __dev_open+0x100/0x220
          __dev_change_flags+0x16c/0x1f0
          dev_change_flags+0x2c/0x70
          do_setlink+0x220/0xa40
          __rtnl_newlink+0x56c/0x8a0
          rtnl_newlink+0x58/0x84
          rtnetlink_rcv_msg+0x138/0x3c4
          netlink_rcv_skb+0x64/0x130
          rtnetlink_rcv+0x20/0x30
          netlink_unicast+0x2ec/0x360
          netlink_sendmsg+0x278/0x490
          __sock_sendmsg+0x5c/0x6c
          ____sys_sendmsg+0x290/0x2d4
          ___sys_sendmsg+0x84/0xd0
          __sys_sendmsg+0x70/0xd0
          __arm64_sys_sendmsg+0x2c/0x40
          invoke_syscall+0x78/0x100
          el0_svc_common.constprop.0+0x54/0x184
          do_el0_svc+0x30/0xac
          el0_svc+0x48/0x160
          el0t_64_sync_handler+0xa4/0x12c
          el0t_64_sync+0x1a4/0x1a8
         Code: bad PC value
         ---[ end trace 7d1c3f3bf9d81885 ]---
         Kernel panic - not syncing: Oops: Fatal exception in interrupt
         Kernel Offset: 0x2870a7a00000 from 0xffff800008000000
         PHYS_OFFSET: 0x80000000
         CPU features: 0x0,000005c1,a3332a5a
         Memory Limit: none
         ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---
    
    The exception happens because there is a pending RX interrupt before the
    call to request_irq(RX IRQ) executes.  Then, the RX IRQ handler fires
    immediately after this request_irq() completes. The RX IRQ handler runs
    "napi_schedule()" before NAPI is fully initialized via "netif_napi_add()"
    and "napi_enable()", both which happen later in the open() logic.
    
    The logic in mlxbf_gige_open() must fully initialize NAPI before any calls
    to request_irq() execute.
    
    Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver")
    Signed-off-by: David Thompson <davthompson@nvidia.com>
    Reviewed-by: Asmaa Mnebhi <asmaa@nvidia.com>
    Link: https://lore.kernel.org/r/20240325183627.7641-1-davthompson@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

mlxbf_gige: stop interface during shutdown [+ + +]

Author: David Thompson <davthompson@nvidia.com>
Date:   Mon Mar 25 17:09:29 2024 -0400

    mlxbf_gige: stop interface during shutdown
    
    commit 09ba28e1cd3cf715daab1fca6e1623e22fd754a6 upstream.
    
    The mlxbf_gige driver intermittantly encounters a NULL pointer
    exception while the system is shutting down via "reboot" command.
    The mlxbf_driver will experience an exception right after executing
    its shutdown() method.  One example of this exception is:
    
    Unable to handle kernel NULL pointer dereference at virtual address 0000000000000070
    Mem abort info:
      ESR = 0x0000000096000004
      EC = 0x25: DABT (current EL), IL = 32 bits
      SET = 0, FnV = 0
      EA = 0, S1PTW = 0
      FSC = 0x04: level 0 translation fault
    Data abort info:
      ISV = 0, ISS = 0x00000004
      CM = 0, WnR = 0
    user pgtable: 4k pages, 48-bit VAs, pgdp=000000011d373000
    [0000000000000070] pgd=0000000000000000, p4d=0000000000000000
    Internal error: Oops: 96000004 [#1] SMP
    CPU: 0 PID: 13 Comm: ksoftirqd/0 Tainted: G S         OE     5.15.0-bf.6.gef6992a #1
    Hardware name: https://www.mellanox.com BlueField SoC/BlueField SoC, BIOS 4.0.2.12669 Apr 21 2023
    pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
    pc : mlxbf_gige_handle_tx_complete+0xc8/0x170 [mlxbf_gige]
    lr : mlxbf_gige_poll+0x54/0x160 [mlxbf_gige]
    sp : ffff8000080d3c10
    x29: ffff8000080d3c10 x28: ffffcce72cbb7000 x27: ffff8000080d3d58
    x26: ffff0000814e7340 x25: ffff331cd1a05000 x24: ffffcce72c4ea008
    x23: ffff0000814e4b40 x22: ffff0000814e4d10 x21: ffff0000814e4128
    x20: 0000000000000000 x19: ffff0000814e4a80 x18: ffffffffffffffff
    x17: 000000000000001c x16: ffffcce72b4553f4 x15: ffff80008805b8a7
    x14: 0000000000000000 x13: 0000000000000030 x12: 0101010101010101
    x11: 7f7f7f7f7f7f7f7f x10: c2ac898b17576267 x9 : ffffcce720fa5404
    x8 : ffff000080812138 x7 : 0000000000002e9a x6 : 0000000000000080
    x5 : ffff00008de3b000 x4 : 0000000000000000 x3 : 0000000000000001
    x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
    Call trace:
     mlxbf_gige_handle_tx_complete+0xc8/0x170 [mlxbf_gige]
     mlxbf_gige_poll+0x54/0x160 [mlxbf_gige]
     __napi_poll+0x40/0x1c8
     net_rx_action+0x314/0x3a0
     __do_softirq+0x128/0x334
     run_ksoftirqd+0x54/0x6c
     smpboot_thread_fn+0x14c/0x190
     kthread+0x10c/0x110
     ret_from_fork+0x10/0x20
    Code: 8b070000 f9000ea0 f95056c0 f86178a1 (b9407002)
    ---[ end trace 7cc3941aa0d8e6a4 ]---
    Kernel panic - not syncing: Oops: Fatal exception in interrupt
    Kernel Offset: 0x4ce722520000 from 0xffff800008000000
    PHYS_OFFSET: 0x80000000
    CPU features: 0x000005c1,a3330e5a
    Memory Limit: none
    ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---
    
    During system shutdown, the mlxbf_gige driver's shutdown() is always executed.
    However, the driver's stop() method will only execute if networking interface
    configuration logic within the Linux distribution has been setup to do so.
    
    If shutdown() executes but stop() does not execute, NAPI remains enabled
    and this can lead to an exception if NAPI is scheduled while the hardware
    interface has only been partially deinitialized.
    
    The networking interface managed by the mlxbf_gige driver must be properly
    stopped during system shutdown so that IFF_UP is cleared, the hardware
    interface is put into a clean state, and NAPI is fully deinitialized.
    
    Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver")
    Signed-off-by: David Thompson <davthompson@nvidia.com>
    Link: https://lore.kernel.org/r/20240325210929.25362-1-davthompson@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mlxbf_gige: stop PHY during open() error paths [+ + +]

Author: David Thompson <davthompson@nvidia.com>
Date:   Wed Mar 20 15:31:17 2024 -0400

    mlxbf_gige: stop PHY during open() error paths
    
    [ Upstream commit d6c30c5a168f8586b8bcc0d8e42e2456eb05209b ]
    
    The mlxbf_gige_open() routine starts the PHY as part of normal
    initialization.  The mlxbf_gige_open() routine must stop the
    PHY during its error paths.
    
    Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver")
    Signed-off-by: David Thompson <davthompson@nvidia.com>
    Reviewed-by: Asmaa Mnebhi <asmaa@nvidia.com>
    Reviewed-by: Andrew Lunn <andrew@lunn.ch>
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

mm/secretmem: fix GUP-fast succeeding on secretmem folios [+ + +]

Author: David Hildenbrand <david@redhat.com>
Date:   Tue Mar 26 15:32:08 2024 +0100

    mm/secretmem: fix GUP-fast succeeding on secretmem folios
    
    commit 65291dcfcf8936e1b23cfd7718fdfde7cfaf7706 upstream.
    
    folio_is_secretmem() currently relies on secretmem folios being LRU
    folios, to save some cycles.
    
    However, folios might reside in a folio batch without the LRU flag set, or
    temporarily have their LRU flag cleared.  Consequently, the LRU flag is
    unreliable for this purpose.
    
    In particular, this is the case when secretmem_fault() allocates a fresh
    page and calls filemap_add_folio()->folio_add_lru().  The folio might be
    added to the per-cpu folio batch and won't get the LRU flag set until the
    batch was drained using e.g., lru_add_drain().
    
    Consequently, folio_is_secretmem() might not detect secretmem folios and
    GUP-fast can succeed in grabbing a secretmem folio, crashing the kernel
    when we would later try reading/writing to the folio, because the folio
    has been unmapped from the directmap.
    
    Fix it by removing that unreliable check.
    
    Link: https://lkml.kernel.org/r/20240326143210.291116-2-david@redhat.com
    Fixes: 1507f51255c9 ("mm: introduce memfd_secret system call to create "secret" memory areas")
    Signed-off-by: David Hildenbrand <david@redhat.com>
    Reported-by: xingwei lee <xrivendell7@gmail.com>
    Reported-by: yue sun <samsun1006219@gmail.com>
    Closes: https://lore.kernel.org/lkml/CABOYnLyevJeravW=QrH0JUPYEcDN160aZFb7kwndm-J2rmz0HQ@mail.gmail.com/
    Debugged-by: Miklos Szeredi <miklos@szeredi.hu>
    Tested-by: Miklos Szeredi <mszeredi@redhat.com>
    Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
    Cc: Lorenzo Stoakes <lstoakes@gmail.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: David Hildenbrand <david@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mptcp: don't account accept() of non-MPC client as fallback to TCP [+ + +]

Author: Davide Caratti <dcaratti@redhat.com>
Date:   Fri Mar 29 13:08:52 2024 +0100

    mptcp: don't account accept() of non-MPC client as fallback to TCP
    
    commit 7a1b3490f47e88ec4cbde65f1a77a0f4bc972282 upstream.
    
    Current MPTCP servers increment MPTcpExtMPCapableFallbackACK when they
    accept non-MPC connections. As reported by Christoph, this is "surprising"
    because the counter might become greater than MPTcpExtMPCapableSYNRX.
    
    MPTcpExtMPCapableFallbackACK counter's name suggests it should only be
    incremented when a connection was seen using MPTCP options, then a
    fallback to TCP has been done. Let's do that by incrementing it when
    the subflow context of an inbound MPC connection attempt is dropped.
    Also, update mptcp_connect.sh kselftest, to ensure that the
    above MIB does not increment in case a pure TCP client connects to a
    MPTCP server.
    
    Fixes: fc518953bc9c ("mptcp: add and use MIB counter infrastructure")
    Cc: stable@vger.kernel.org
    Reported-by: Christoph Paasch <cpaasch@apple.com>
    Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/449
    Signed-off-by: Davide Caratti <dcaratti@redhat.com>
    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Link: https://lore.kernel.org/r/20240329-upstream-net-20240329-fallback-mib-v1-1-324a8981da48@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net/rds: fix possible cp null dereference [+ + +]

Author: Mahmoud Adam <mngyadam@amazon.com>
Date:   Tue Mar 26 16:31:33 2024 +0100

    net/rds: fix possible cp null dereference
    
    commit 62fc3357e079a07a22465b9b6ef71bb6ea75ee4b upstream.
    
    cp might be null, calling cp->cp_conn would produce null dereference
    
    [Simon Horman adds:]
    
    Analysis:
    
    * cp is a parameter of __rds_rdma_map and is not reassigned.
    
    * The following call-sites pass a NULL cp argument to __rds_rdma_map()
    
      - rds_get_mr()
      - rds_get_mr_for_dest
    
    * Prior to the code above, the following assumes that cp may be NULL
      (which is indicative, but could itself be unnecessary)
    
            trans_private = rs->rs_transport->get_mr(
                    sg, nents, rs, &mr->r_key, cp ? cp->cp_conn : NULL,
                    args->vec.addr, args->vec.bytes,
                    need_odp ? ODP_ZEROBASED : ODP_NOT_NEEDED);
    
    * The code modified by this patch is guarded by IS_ERR(trans_private),
      where trans_private is assigned as per the previous point in this analysis.
    
      The only implementation of get_mr that I could locate is rds_ib_get_mr()
      which can return an ERR_PTR if the conn (4th) argument is NULL.
    
    * ret is set to PTR_ERR(trans_private).
      rds_ib_get_mr can return ERR_PTR(-ENODEV) if the conn (4th) argument is NULL.
      Thus ret may be -ENODEV in which case the code in question will execute.
    
    Conclusion:
    * cp may be NULL at the point where this patch adds a check;
      this patch does seem to address a possible bug
    
    Fixes: c055fc00c07b ("net/rds: fix WARNING in rds_conn_connect_if_down")
    Cc: stable@vger.kernel.org # v4.19+
    Signed-off-by: Mahmoud Adam <mngyadam@amazon.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Link: https://lore.kernel.org/r/20240326153132.55580-1-mngyadam@amazon.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net/sched: act_skbmod: prevent kernel-infoleak [+ + +]

Author: Eric Dumazet <edumazet@google.com>
Date:   Wed Apr 3 13:09:08 2024 +0000

    net/sched: act_skbmod: prevent kernel-infoleak
    
    commit d313eb8b77557a6d5855f42d2234bd592c7b50dd upstream.
    
    syzbot found that tcf_skbmod_dump() was copying four bytes
    from kernel stack to user space [1].
    
    The issue here is that 'struct tc_skbmod' has a four bytes hole.
    
    We need to clear the structure before filling fields.
    
    [1]
    BUG: KMSAN: kernel-infoleak in instrument_copy_to_user include/linux/instrumented.h:114 [inline]
     BUG: KMSAN: kernel-infoleak in copy_to_user_iter lib/iov_iter.c:24 [inline]
     BUG: KMSAN: kernel-infoleak in iterate_ubuf include/linux/iov_iter.h:29 [inline]
     BUG: KMSAN: kernel-infoleak in iterate_and_advance2 include/linux/iov_iter.h:245 [inline]
     BUG: KMSAN: kernel-infoleak in iterate_and_advance include/linux/iov_iter.h:271 [inline]
     BUG: KMSAN: kernel-infoleak in _copy_to_iter+0x366/0x2520 lib/iov_iter.c:185
      instrument_copy_to_user include/linux/instrumented.h:114 [inline]
      copy_to_user_iter lib/iov_iter.c:24 [inline]
      iterate_ubuf include/linux/iov_iter.h:29 [inline]
      iterate_and_advance2 include/linux/iov_iter.h:245 [inline]
      iterate_and_advance include/linux/iov_iter.h:271 [inline]
      _copy_to_iter+0x366/0x2520 lib/iov_iter.c:185
      copy_to_iter include/linux/uio.h:196 [inline]
      simple_copy_to_iter net/core/datagram.c:532 [inline]
      __skb_datagram_iter+0x185/0x1000 net/core/datagram.c:420
      skb_copy_datagram_iter+0x5c/0x200 net/core/datagram.c:546
      skb_copy_datagram_msg include/linux/skbuff.h:4050 [inline]
      netlink_recvmsg+0x432/0x1610 net/netlink/af_netlink.c:1962
      sock_recvmsg_nosec net/socket.c:1046 [inline]
      sock_recvmsg+0x2c4/0x340 net/socket.c:1068
      __sys_recvfrom+0x35a/0x5f0 net/socket.c:2242
      __do_sys_recvfrom net/socket.c:2260 [inline]
      __se_sys_recvfrom net/socket.c:2256 [inline]
      __x64_sys_recvfrom+0x126/0x1d0 net/socket.c:2256
     do_syscall_64+0xd5/0x1f0
     entry_SYSCALL_64_after_hwframe+0x6d/0x75
    
    Uninit was stored to memory at:
      pskb_expand_head+0x30f/0x19d0 net/core/skbuff.c:2253
      netlink_trim+0x2c2/0x330 net/netlink/af_netlink.c:1317
      netlink_unicast+0x9f/0x1260 net/netlink/af_netlink.c:1351
      nlmsg_unicast include/net/netlink.h:1144 [inline]
      nlmsg_notify+0x21d/0x2f0 net/netlink/af_netlink.c:2610
      rtnetlink_send+0x73/0x90 net/core/rtnetlink.c:741
      rtnetlink_maybe_send include/linux/rtnetlink.h:17 [inline]
      tcf_add_notify net/sched/act_api.c:2048 [inline]
      tcf_action_add net/sched/act_api.c:2071 [inline]
      tc_ctl_action+0x146e/0x19d0 net/sched/act_api.c:2119
      rtnetlink_rcv_msg+0x1737/0x1900 net/core/rtnetlink.c:6595
      netlink_rcv_skb+0x375/0x650 net/netlink/af_netlink.c:2559
      rtnetlink_rcv+0x34/0x40 net/core/rtnetlink.c:6613
      netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
      netlink_unicast+0xf4c/0x1260 net/netlink/af_netlink.c:1361
      netlink_sendmsg+0x10df/0x11f0 net/netlink/af_netlink.c:1905
      sock_sendmsg_nosec net/socket.c:730 [inline]
      __sock_sendmsg+0x30f/0x380 net/socket.c:745
      ____sys_sendmsg+0x877/0xb60 net/socket.c:2584
      ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2638
      __sys_sendmsg net/socket.c:2667 [inline]
      __do_sys_sendmsg net/socket.c:2676 [inline]
      __se_sys_sendmsg net/socket.c:2674 [inline]
      __x64_sys_sendmsg+0x307/0x4a0 net/socket.c:2674
     do_syscall_64+0xd5/0x1f0
     entry_SYSCALL_64_after_hwframe+0x6d/0x75
    
    Uninit was stored to memory at:
      __nla_put lib/nlattr.c:1041 [inline]
      nla_put+0x1c6/0x230 lib/nlattr.c:1099
      tcf_skbmod_dump+0x23f/0xc20 net/sched/act_skbmod.c:256
      tcf_action_dump_old net/sched/act_api.c:1191 [inline]
      tcf_action_dump_1+0x85e/0x970 net/sched/act_api.c:1227
      tcf_action_dump+0x1fd/0x460 net/sched/act_api.c:1251
      tca_get_fill+0x519/0x7a0 net/sched/act_api.c:1628
      tcf_add_notify_msg net/sched/act_api.c:2023 [inline]
      tcf_add_notify net/sched/act_api.c:2042 [inline]
      tcf_action_add net/sched/act_api.c:2071 [inline]
      tc_ctl_action+0x1365/0x19d0 net/sched/act_api.c:2119
      rtnetlink_rcv_msg+0x1737/0x1900 net/core/rtnetlink.c:6595
      netlink_rcv_skb+0x375/0x650 net/netlink/af_netlink.c:2559
      rtnetlink_rcv+0x34/0x40 net/core/rtnetlink.c:6613
      netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
      netlink_unicast+0xf4c/0x1260 net/netlink/af_netlink.c:1361
      netlink_sendmsg+0x10df/0x11f0 net/netlink/af_netlink.c:1905
      sock_sendmsg_nosec net/socket.c:730 [inline]
      __sock_sendmsg+0x30f/0x380 net/socket.c:745
      ____sys_sendmsg+0x877/0xb60 net/socket.c:2584
      ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2638
      __sys_sendmsg net/socket.c:2667 [inline]
      __do_sys_sendmsg net/socket.c:2676 [inline]
      __se_sys_sendmsg net/socket.c:2674 [inline]
      __x64_sys_sendmsg+0x307/0x4a0 net/socket.c:2674
     do_syscall_64+0xd5/0x1f0
     entry_SYSCALL_64_after_hwframe+0x6d/0x75
    
    Local variable opt created at:
      tcf_skbmod_dump+0x9d/0xc20 net/sched/act_skbmod.c:244
      tcf_action_dump_old net/sched/act_api.c:1191 [inline]
      tcf_action_dump_1+0x85e/0x970 net/sched/act_api.c:1227
    
    Bytes 188-191 of 248 are uninitialized
    Memory access of size 248 starts at ffff888117697680
    Data copied to user address 00007ffe56d855f0
    
    Fixes: 86da71b57383 ("net_sched: Introduce skbmod action")
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
    Link: https://lore.kernel.org/r/20240403130908.93421-1-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net/sched: fix lockdep splat in qdisc_tree_reduce_backlog() [+ + +]

Author: Eric Dumazet <edumazet@google.com>
Date:   Tue Apr 2 13:41:33 2024 +0000

    net/sched: fix lockdep splat in qdisc_tree_reduce_backlog()
    
    commit 7eb322360b0266481e560d1807ee79e0cef5742b upstream.
    
    qdisc_tree_reduce_backlog() is called with the qdisc lock held,
    not RTNL.
    
    We must use qdisc_lookup_rcu() instead of qdisc_lookup()
    
    syzbot reported:
    
    WARNING: suspicious RCU usage
    6.1.74-syzkaller #0 Not tainted
    -----------------------------
    net/sched/sch_api.c:305 suspicious rcu_dereference_protected() usage!
    
    other info that might help us debug this:
    
    rcu_scheduler_active = 2, debug_locks = 1
    3 locks held by udevd/1142:
      #0: ffffffff87c729a0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:306 [inline]
      #0: ffffffff87c729a0 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:747 [inline]
      #0: ffffffff87c729a0 (rcu_read_lock){....}-{1:2}, at: net_tx_action+0x64a/0x970 net/core/dev.c:5282
      #1: ffff888171861108 (&sch->q.lock){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:350 [inline]
      #1: ffff888171861108 (&sch->q.lock){+.-.}-{2:2}, at: net_tx_action+0x754/0x970 net/core/dev.c:5297
      #2: ffffffff87c729a0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:306 [inline]
      #2: ffffffff87c729a0 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:747 [inline]
      #2: ffffffff87c729a0 (rcu_read_lock){....}-{1:2}, at: qdisc_tree_reduce_backlog+0x84/0x580 net/sched/sch_api.c:792
    
    stack backtrace:
    CPU: 1 PID: 1142 Comm: udevd Not tainted 6.1.74-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024
    Call Trace:
     <TASK>
      [<ffffffff85b85f14>] __dump_stack lib/dump_stack.c:88 [inline]
      [<ffffffff85b85f14>] dump_stack_lvl+0x1b1/0x28f lib/dump_stack.c:106
      [<ffffffff85b86007>] dump_stack+0x15/0x1e lib/dump_stack.c:113
      [<ffffffff81802299>] lockdep_rcu_suspicious+0x1b9/0x260 kernel/locking/lockdep.c:6592
      [<ffffffff84f0054c>] qdisc_lookup+0xac/0x6f0 net/sched/sch_api.c:305
      [<ffffffff84f037c3>] qdisc_tree_reduce_backlog+0x243/0x580 net/sched/sch_api.c:811
      [<ffffffff84f5b78c>] pfifo_tail_enqueue+0x32c/0x4b0 net/sched/sch_fifo.c:51
      [<ffffffff84fbcf63>] qdisc_enqueue include/net/sch_generic.h:833 [inline]
      [<ffffffff84fbcf63>] netem_dequeue+0xeb3/0x15d0 net/sched/sch_netem.c:723
      [<ffffffff84eecab9>] dequeue_skb net/sched/sch_generic.c:292 [inline]
      [<ffffffff84eecab9>] qdisc_restart net/sched/sch_generic.c:397 [inline]
      [<ffffffff84eecab9>] __qdisc_run+0x249/0x1e60 net/sched/sch_generic.c:415
      [<ffffffff84d7aa96>] qdisc_run+0xd6/0x260 include/net/pkt_sched.h:125
      [<ffffffff84d85d29>] net_tx_action+0x7c9/0x970 net/core/dev.c:5313
      [<ffffffff85e002bd>] __do_softirq+0x2bd/0x9bd kernel/softirq.c:616
      [<ffffffff81568bca>] invoke_softirq kernel/softirq.c:447 [inline]
      [<ffffffff81568bca>] __irq_exit_rcu+0xca/0x230 kernel/softirq.c:700
      [<ffffffff81568ae9>] irq_exit_rcu+0x9/0x20 kernel/softirq.c:712
      [<ffffffff85b89f52>] sysvec_apic_timer_interrupt+0x42/0x90 arch/x86/kernel/apic/apic.c:1107
      [<ffffffff85c00ccb>] asm_sysvec_apic_timer_interrupt+0x1b/0x20 arch/x86/include/asm/idtentry.h:656
    
    Fixes: d636fc5dd692 ("net: sched: add rcu annotations around qdisc->qdisc_sleeping")
    Reported-by: syzbot <syzkaller@googlegroups.com>
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: Jiri Pirko <jiri@nvidia.com>
    Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
    Link: https://lore.kernel.org/r/20240402134133.2352776-1-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net: fec: Set mac_managed_pm during probe [+ + +]

Author: Wei Fang <wei.fang@nxp.com>
Date:   Thu Mar 28 15:59:29 2024 +0000

    net: fec: Set mac_managed_pm during probe
    
    [ Upstream commit cbc17e7802f5de37c7c262204baadfad3f7f99e5 ]
    
    Setting mac_managed_pm during interface up is too late.
    
    In situations where the link is not brought up yet and the system suspends
    the regular PHY power management will run. Since the FEC ETHEREN control
    bit is cleared (automatically) on suspend the controller is off in resume.
    When the regular PHY power management resume path runs in this context it
    will write to the MII_DATA register but nothing will be transmitted on the
    MDIO bus.
    
    This can be observed by the following log:
    
        fec 5b040000.ethernet eth0: MDIO read timeout
        Microchip LAN87xx T1 5b040000.ethernet-1:04: PM: dpm_run_callback(): mdio_bus_phy_resume+0x0/0xc8 returns -110
        Microchip LAN87xx T1 5b040000.ethernet-1:04: PM: failed to resume: error -110
    
    The data written will however remain in the MII_DATA register.
    
    When the link later is set to administrative up it will trigger a call to
    fec_restart() which will restore the MII_SPEED register. This triggers the
    quirk explained in f166f890c8f0 ("net: ethernet: fec: Replace interrupt
    driven MDIO with polled IO") causing an extra MII_EVENT.
    
    This extra event desynchronizes all the MDIO register reads, causing them
    to complete too early. Leading all reads to read as 0 because
    fec_enet_mdio_wait() returns too early.
    
    When a Microchip LAN8700R PHY is connected to the FEC, the 0 reads causes
    the PHY to be initialized incorrectly and the PHY will not transmit any
    ethernet signal in this state. It cannot be brought out of this state
    without a power cycle of the PHY.
    
    Fixes: 557d5dc83f68 ("net: fec: use mac-managed PHY PM")
    Closes: https://lore.kernel.org/netdev/1f45bdbe-eab1-4e59-8f24-add177590d27@actia.se/
    Signed-off-by: Wei Fang <wei.fang@nxp.com>
    [jernberg: commit message]
    Signed-off-by: John Ernberg <john.ernberg@actia.se>
    Link: https://lore.kernel.org/r/20240328155909.59613-2-john.ernberg@actia.se
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: hns3: fix index limit to support all queue stats [+ + +]

Author: Jie Wang <wangjie125@huawei.com>
Date:   Mon Mar 25 20:43:09 2024 +0800

    net: hns3: fix index limit to support all queue stats
    
    [ Upstream commit 47e39d213e09c6cae0d6b4d95e454ea404013312 ]
    
    Currently, hns hardware supports more than 512 queues and the index limit
    in hclge_comm_tqps_update_stats is wrong. So this patch removes it.
    
    Fixes: 287db5c40d15 ("net: hns3: create new set of common tqp stats APIs for PF and VF reuse")
    Signed-off-by: Jie Wang <wangjie125@huawei.com>
    Signed-off-by: Jijie Shao <shaojijie@huawei.com>
    Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
    Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: hns3: fix kernel crash when devlink reload during pf initialization [+ + +]

Author: Yonglong Liu <liuyonglong@huawei.com>
Date:   Mon Mar 25 20:43:10 2024 +0800

    net: hns3: fix kernel crash when devlink reload during pf initialization
    
    [ Upstream commit 93305b77ffcb042f1538ecc383505e87d95aa05a ]
    
    The devlink reload process will access the hardware resources,
    but the register operation is done before the hardware is initialized.
    So, processing the devlink reload during initialization may lead to kernel
    crash. This patch fixes this by taking devl_lock during initialization.
    
    Fixes: b741269b2759 ("net: hns3: add support for registering devlink for PF")
    Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
    Signed-off-by: Jijie Shao <shaojijie@huawei.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: hns3: mark unexcuted loopback test result as UNEXECUTED [+ + +]

Author: Jian Shen <shenjian15@huawei.com>
Date:   Mon Mar 25 20:43:11 2024 +0800

    net: hns3: mark unexcuted loopback test result as UNEXECUTED
    
    [ Upstream commit 5bd088d6c21a45ee70e6116879310e54174d75eb ]
    
    Currently, loopback test may be skipped when resetting, but the test
    result will still show as 'PASS', because the driver doesn't set
    ETH_TEST_FL_FAILED flag. Fix it by setting the flag and
    initializating the value to UNEXECUTED.
    
    Fixes: 4c8dab1c709c ("net: hns3: reconstruct function hns3_self_test")
    Signed-off-by: Jian Shen <shenjian15@huawei.com>
    Signed-off-by: Jijie Shao <shaojijie@huawei.com>
    Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: lan743x: Add set RFE read fifo threshold for PCI1x1x chips [+ + +]

Author: Raju Lakkaraju <Raju.Lakkaraju@microchip.com>
Date:   Tue Mar 26 12:28:05 2024 +0530

    net: lan743x: Add set RFE read fifo threshold for PCI1x1x chips
    
    [ Upstream commit e4a58989f5c839316ac63675e8800b9eed7dbe96 ]
    
    PCI11x1x Rev B0 devices might drop packets when receiving back to back frames
    at 2.5G link speed. Change the B0 Rev device's Receive filtering Engine FIFO
    threshold parameter from its hardware default of 4 to 3 dwords to prevent the
    problem. Rev C0 and later hardware already defaults to 3 dwords.
    
    Fixes: bb4f6bffe33c ("net: lan743x: Add PCI11010 / PCI11414 device IDs")
    Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Link: https://lore.kernel.org/r/20240326065805.686128-1-Raju.Lakkaraju@microchip.com
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: phy: micrel: Fix potential null pointer dereference [+ + +]

Author: Aleksandr Mishin <amishin@t-argos.ru>
Date:   Fri Mar 29 09:16:31 2024 +0300

    net: phy: micrel: Fix potential null pointer dereference
    
    commit 96c155943a703f0655c0c4cab540f67055960e91 upstream.
    
    In lan8814_get_sig_rx() and lan8814_get_sig_tx() ptp_parse_header() may
    return NULL as ptp_header due to abnormal packet type or corrupted packet.
    Fix this bug by adding ptp_header check.
    
    Found by Linux Verification Center (linuxtesting.org) with SVACE.
    
    Fixes: ece19502834d ("net: phy: micrel: 1588 support for LAN8814 phy")
    Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru>
    Reviewed-by: Andrew Lunn <andrew@lunn.ch>
    Link: https://lore.kernel.org/r/20240329061631.33199-1-amishin@t-argos.ru
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net: phy: micrel: lan8814: Fix when enabling/disabling 1-step timestamping [+ + +]

Author: Horatiu Vultur <horatiu.vultur@microchip.com>
Date:   Tue Apr 2 09:16:34 2024 +0200

    net: phy: micrel: lan8814: Fix when enabling/disabling 1-step timestamping
    
    commit de99e1ea3a35f23ff83a31d6b08f43d27b2c6345 upstream.
    
    There are 2 issues with the blamed commit.
    1. When the phy is initialized, it would enable the disabled of UDPv4
       checksums. The UDPv6 checksum is already enabled by default. So when
       1-step is configured then it would clear these flags.
    2. After the 1-step is configured, then if 2-step is configured then the
       1-step would be still configured because it is not clearing the flag.
       So the sync frames will still have origin timestamps set.
    
    Fix this by reading first the value of the register and then
    just change bit 12 as this one determines if the timestamp needs to
    be inserted in the frame, without changing any other bits.
    
    Fixes: ece19502834d ("net: phy: micrel: 1588 support for LAN8814 phy")
    Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
    Reviewed-by: Divya Koppera <divya.koppera@microchip.com>
    Link: https://lore.kernel.org/r/20240402071634.2483524-1-horatiu.vultur@microchip.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net: ravb: Always process TX descriptor ring [+ + +]

Author: Paul Barker <paul.barker.ct@bp.renesas.com>
Date:   Tue Apr 2 15:53:04 2024 +0100

    net: ravb: Always process TX descriptor ring
    
    [ Upstream commit 596a4254915f94c927217fe09c33a6828f33fb25 ]
    
    The TX queue should be serviced each time the poll function is called,
    even if the full RX work budget has been consumed. This prevents
    starvation of the TX queue when RX bandwidth usage is high.
    
    Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper")
    Signed-off-by: Paul Barker <paul.barker.ct@bp.renesas.com>
    Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
    Link: https://lore.kernel.org/r/20240402145305.82148-1-paul.barker.ct@bp.renesas.com
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: ravb: Always update error counters [+ + +]

Author: Paul Barker <paul.barker.ct@bp.renesas.com>
Date:   Tue Apr 2 15:53:05 2024 +0100

    net: ravb: Always update error counters
    
    [ Upstream commit 101b76418d7163240bc74a7e06867dca0e51183e ]
    
    The error statistics should be updated each time the poll function is
    called, even if the full RX work budget has been consumed. This prevents
    the counts from becoming stuck when RX bandwidth usage is high.
    
    This also ensures that error counters are not updated after we've
    re-enabled interrupts as that could result in a race condition.
    
    Also drop an unnecessary space.
    
    Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper")
    Signed-off-by: Paul Barker <paul.barker.ct@bp.renesas.com>
    Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
    Link: https://lore.kernel.org/r/20240402145305.82148-2-paul.barker.ct@bp.renesas.com
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: ravb: Let IP-specific receive function to interrogate descriptors [+ + +]

Author: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Date:   Fri Feb 2 10:41:22 2024 +0200

    net: ravb: Let IP-specific receive function to interrogate descriptors
    
    [ Upstream commit 2b993bfdb47b3aaafd8fe9cd5038b5e297b18ee1 ]
    
    ravb_poll() initial code used to interrogate the first descriptor of the
    RX queue in case gPTP is false to determine if ravb_rx() should be called.
    This is done for non-gPTP IPs. For gPTP IPs the driver PTP-specific
    information was used to determine if receive function should be called. As
    every IP has its own receive function that interrogates the RX descriptors
    list in the same way the ravb_poll() was doing there is no need to double
    check this in ravb_poll(). Removing the code from ravb_poll() leads to a
    cleaner code.
    
    Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
    Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Stable-dep-of: 596a4254915f ("net: ravb: Always process TX descriptor ring")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: stmmac: fix rx queue priority assignment [+ + +]

Author: Piotr Wejman <piotrwejman90@gmail.com>
Date:   Mon Apr 1 21:22:39 2024 +0200

    net: stmmac: fix rx queue priority assignment
    
    commit b3da86d432b7cd65b025a11f68613e333d2483db upstream.
    
    The driver should ensure that same priority is not mapped to multiple
    rx queues. From DesignWare Cores Ethernet Quality-of-Service
    Databook, section 17.1.29 MAC_RxQ_Ctrl2:
    "[...]The software must ensure that the content of this field is
    mutually exclusive to the PSRQ fields for other queues, that is,
    the same priority is not mapped to multiple Rx queues[...]"
    
    Previously rx_queue_priority() function was:
    - clearing all priorities from a queue
    - adding new priorities to that queue
    After this patch it will:
    - first assign new priorities to a queue
    - then remove those priorities from all other queues
    - keep other priorities previously assigned to that queue
    
    Fixes: a8f5102af2a7 ("net: stmmac: TX and RX queue priority configuration")
    Fixes: 2142754f8b9c ("net: stmmac: Add MAC related callbacks for XGMAC2")
    Signed-off-by: Piotr Wejman <piotrwejman90@gmail.com>
    Link: https://lore.kernel.org/r/20240401192239.33942-1-piotrwejman90@gmail.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net: usb: ax88179_178a: avoid the interface always configured as random address [+ + +]

Author: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
Date:   Wed Apr 3 15:21:58 2024 +0200

    net: usb: ax88179_178a: avoid the interface always configured as random address
    
    commit 2e91bb99b9d4f756e92e83c4453f894dda220f09 upstream.
    
    After the commit d2689b6a86b9 ("net: usb: ax88179_178a: avoid two
    consecutive device resets"), reset is not executed from bind operation and
    mac address is not read from the device registers or the devicetree at that
    moment. Since the check to configure if the assigned mac address is random
    or not for the interface, happens after the bind operation from
    usbnet_probe, the interface keeps configured as random address, although the
    address is correctly read and set during open operation (the only reset
    now).
    
    In order to keep only one reset for the device and to avoid the interface
    always configured as random address, after reset, configure correctly the
    suitable field from the driver, if the mac address is read successfully from
    the device registers or the devicetree. Take into account if a locally
    administered address (random) was previously stored.
    
    cc: stable@vger.kernel.org # 6.6+
    Fixes: d2689b6a86b9 ("net: usb: ax88179_178a: avoid two consecutive device resets")
    Reported-by: Dave Stevenson  <dave.stevenson@raspberrypi.com>
    Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Link: https://lore.kernel.org/r/20240403132158.344838-1-jtornosm@redhat.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net: wwan: t7xx: Split 64bit accesses to fix alignment issues [+ + +]

Author: Bjц╦rn Mork <bjorn@mork.no>
Date:   Fri Mar 22 15:40:00 2024 +0100

    net: wwan: t7xx: Split 64bit accesses to fix alignment issues
    
    [ Upstream commit 7d5a7dd5a35876f0ecc286f3602a88887a788217 ]
    
    Some of the registers are aligned on a 32bit boundary, causing
    alignment faults on 64bit platforms.
    
     Unable to handle kernel paging request at virtual address ffffffc084a1d004
     Mem abort info:
     ESR = 0x0000000096000061
     EC = 0x25: DABT (current EL), IL = 32 bits
     SET = 0, FnV = 0
     EA = 0, S1PTW = 0
     FSC = 0x21: alignment fault
     Data abort info:
     ISV = 0, ISS = 0x00000061, ISS2 = 0x00000000
     CM = 0, WnR = 1, TnD = 0, TagAccess = 0
     GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
     swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000046ad6000
     [ffffffc084a1d004] pgd=100000013ffff003, p4d=100000013ffff003, pud=100000013ffff003, pmd=0068000020a00711
     Internal error: Oops: 0000000096000061 [#1] SMP
     Modules linked in: mtk_t7xx(+) qcserial pppoe ppp_async option nft_fib_inet nf_flow_table_inet mt7921u(O) mt7921s(O) mt7921e(O) mt7921_common(O) iwlmvm(O) iwldvm(O) usb_wwan rndis_host qmi_wwan pppox ppp_generic nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject nft_redir nft_quota nft_numgen nft_nat nft_masq nft_log nft_limit nft_hash nft_flow_offload nft_fib_ipv6 nft_fib_ipv4 nft_fib nft_ct nft_chain_nat nf_tables nf_nat nf_flow_table nf_conntrack mt7996e(O) mt792x_usb(O) mt792x_lib(O) mt7915e(O) mt76_usb(O) mt76_sdio(O) mt76_connac_lib(O) mt76(O) mac80211(O) iwlwifi(O) huawei_cdc_ncm cfg80211(O) cdc_ncm cdc_ether wwan usbserial usbnet slhc sfp rtc_pcf8563 nfnetlink nf_reject_ipv6 nf_reject_ipv4 nf_log_syslog nf_defrag_ipv6 nf_defrag_ipv4 mt6577_auxadc mdio_i2c libcrc32c compat(O) cdc_wdm cdc_acm at24 crypto_safexcel pwm_fan i2c_gpio i2c_smbus industrialio i2c_algo_bit i2c_mux_reg i2c_mux_pca954x i2c_mux_pca9541 i2c_mux_gpio i2c_mux dummy oid_registry tun sha512_arm64 sha1_ce sha1_generic seqiv
     md5 geniv des_generic libdes cbc authencesn authenc leds_gpio xhci_plat_hcd xhci_pci xhci_mtk_hcd xhci_hcd nvme nvme_core gpio_button_hotplug(O) dm_mirror dm_region_hash dm_log dm_crypt dm_mod dax usbcore usb_common ptp aquantia pps_core mii tpm encrypted_keys trusted
     CPU: 3 PID: 5266 Comm: kworker/u9:1 Tainted: G O 6.6.22 #0
     Hardware name: Bananapi BPI-R4 (DT)
     Workqueue: md_hk_wq t7xx_fsm_uninit [mtk_t7xx]
     pstate: 804000c5 (Nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
     pc : t7xx_cldma_hw_set_start_addr+0x1c/0x3c [mtk_t7xx]
     lr : t7xx_cldma_start+0xac/0x13c [mtk_t7xx]
     sp : ffffffc085d63d30
     x29: ffffffc085d63d30 x28: 0000000000000000 x27: 0000000000000000
     x26: 0000000000000000 x25: ffffff80c804f2c0 x24: ffffff80ca196c05
     x23: 0000000000000000 x22: ffffff80c814b9b8 x21: ffffff80c814b128
     x20: 0000000000000001 x19: ffffff80c814b080 x18: 0000000000000014
     x17: 0000000055c9806b x16: 000000007c5296d0 x15: 000000000f6bca68
     x14: 00000000dbdbdce4 x13: 000000001aeaf72a x12: 0000000000000001
     x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
     x8 : ffffff80ca1ef6b4 x7 : ffffff80c814b818 x6 : 0000000000000018
     x5 : 0000000000000870 x4 : 0000000000000000 x3 : 0000000000000000
     x2 : 000000010a947000 x1 : ffffffc084a1d004 x0 : ffffffc084a1d004
     Call trace:
     t7xx_cldma_hw_set_start_addr+0x1c/0x3c [mtk_t7xx]
     t7xx_fsm_uninit+0x578/0x5ec [mtk_t7xx]
     process_one_work+0x154/0x2a0
     worker_thread+0x2ac/0x488
     kthread+0xe0/0xec
     ret_from_fork+0x10/0x20
     Code: f9400800 91001000 8b214001 d50332bf (f9000022)
     ---[ end trace 0000000000000000 ]---
    
    The inclusion of io-64-nonatomic-lo-hi.h indicates that all 64bit
    accesses can be replaced by pairs of nonatomic 32bit access.  Fix
    alignment by forcing all accesses to be 32bit on 64bit platforms.
    
    Link: https://forum.openwrt.org/t/fibocom-fm350-gl-support/142682/72
    Fixes: 39d439047f1d ("net: wwan: t7xx: Add control DMA interface")
    Signed-off-by: Bjц╦rn Mork <bjorn@mork.no>
    Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
    Tested-by: Liviu Dudau <liviu@dudau.co.uk>
    Link: https://lore.kernel.org/r/20240322144000.1683822-1-bjorn@mork.no
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

netfilter: nf_tables: Fix potential data-race in __nft_flowtable_type_get() [+ + +]

Author: Ziyang Xuan <william.xuanziyang@huawei.com>
Date:   Wed Apr 3 15:22:04 2024 +0800

    netfilter: nf_tables: Fix potential data-race in __nft_flowtable_type_get()
    
    commit 24225011d81b471acc0e1e315b7d9905459a6304 upstream.
    
    nft_unregister_flowtable_type() within nf_flow_inet_module_exit() can
    concurrent with __nft_flowtable_type_get() within nf_tables_newflowtable().
    And thhere is not any protection when iterate over nf_tables_flowtables
    list in __nft_flowtable_type_get(). Therefore, there is pertential
    data-race of nf_tables_flowtables list entry.
    
    Use list_for_each_entry_rcu() to iterate over nf_tables_flowtables list
    in __nft_flowtable_type_get(), and use rcu_read_lock() in the caller
    nft_flowtable_type_get() to protect the entire type query process.
    
    Fixes: 3b49e2e94e6e ("netfilter: nf_tables: add flow table netlink frontend")
    Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

netfilter: nf_tables: flush pending destroy work before exit_net release [+ + +]

Author: Pablo Neira Ayuso <pablo@netfilter.org>
Date:   Tue Apr 2 18:04:36 2024 +0200

    netfilter: nf_tables: flush pending destroy work before exit_net release
    
    commit 24cea9677025e0de419989ecb692acd4bb34cac2 upstream.
    
    Similar to 2c9f0293280e ("netfilter: nf_tables: flush pending destroy
    work before netlink notifier") to address a race between exit_net and
    the destroy workqueue.
    
    The trace below shows an element to be released via destroy workqueue
    while exit_net path (triggered via module removal) has already released
    the set that is used in such transaction.
    
    [ 1360.547789] BUG: KASAN: slab-use-after-free in nf_tables_trans_destroy_work+0x3f5/0x590 [nf_tables]
    [ 1360.547861] Read of size 8 at addr ffff888140500cc0 by task kworker/4:1/152465
    [ 1360.547870] CPU: 4 PID: 152465 Comm: kworker/4:1 Not tainted 6.8.0+ #359
    [ 1360.547882] Workqueue: events nf_tables_trans_destroy_work [nf_tables]
    [ 1360.547984] Call Trace:
    [ 1360.547991]  <TASK>
    [ 1360.547998]  dump_stack_lvl+0x53/0x70
    [ 1360.548014]  print_report+0xc4/0x610
    [ 1360.548026]  ? __virt_addr_valid+0xba/0x160
    [ 1360.548040]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
    [ 1360.548054]  ? nf_tables_trans_destroy_work+0x3f5/0x590 [nf_tables]
    [ 1360.548176]  kasan_report+0xae/0xe0
    [ 1360.548189]  ? nf_tables_trans_destroy_work+0x3f5/0x590 [nf_tables]
    [ 1360.548312]  nf_tables_trans_destroy_work+0x3f5/0x590 [nf_tables]
    [ 1360.548447]  ? __pfx_nf_tables_trans_destroy_work+0x10/0x10 [nf_tables]
    [ 1360.548577]  ? _raw_spin_unlock_irq+0x18/0x30
    [ 1360.548591]  process_one_work+0x2f1/0x670
    [ 1360.548610]  worker_thread+0x4d3/0x760
    [ 1360.548627]  ? __pfx_worker_thread+0x10/0x10
    [ 1360.548640]  kthread+0x16b/0x1b0
    [ 1360.548653]  ? __pfx_kthread+0x10/0x10
    [ 1360.548665]  ret_from_fork+0x2f/0x50
    [ 1360.548679]  ? __pfx_kthread+0x10/0x10
    [ 1360.548690]  ret_from_fork_asm+0x1a/0x30
    [ 1360.548707]  </TASK>
    
    [ 1360.548719] Allocated by task 192061:
    [ 1360.548726]  kasan_save_stack+0x20/0x40
    [ 1360.548739]  kasan_save_track+0x14/0x30
    [ 1360.548750]  __kasan_kmalloc+0x8f/0xa0
    [ 1360.548760]  __kmalloc_node+0x1f1/0x450
    [ 1360.548771]  nf_tables_newset+0x10c7/0x1b50 [nf_tables]
    [ 1360.548883]  nfnetlink_rcv_batch+0xbc4/0xdc0 [nfnetlink]
    [ 1360.548909]  nfnetlink_rcv+0x1a8/0x1e0 [nfnetlink]
    [ 1360.548927]  netlink_unicast+0x367/0x4f0
    [ 1360.548935]  netlink_sendmsg+0x34b/0x610
    [ 1360.548944]  ____sys_sendmsg+0x4d4/0x510
    [ 1360.548953]  ___sys_sendmsg+0xc9/0x120
    [ 1360.548961]  __sys_sendmsg+0xbe/0x140
    [ 1360.548971]  do_syscall_64+0x55/0x120
    [ 1360.548982]  entry_SYSCALL_64_after_hwframe+0x55/0x5d
    
    [ 1360.548994] Freed by task 192222:
    [ 1360.548999]  kasan_save_stack+0x20/0x40
    [ 1360.549009]  kasan_save_track+0x14/0x30
    [ 1360.549019]  kasan_save_free_info+0x3b/0x60
    [ 1360.549028]  poison_slab_object+0x100/0x180
    [ 1360.549036]  __kasan_slab_free+0x14/0x30
    [ 1360.549042]  kfree+0xb6/0x260
    [ 1360.549049]  __nft_release_table+0x473/0x6a0 [nf_tables]
    [ 1360.549131]  nf_tables_exit_net+0x170/0x240 [nf_tables]
    [ 1360.549221]  ops_exit_list+0x50/0xa0
    [ 1360.549229]  free_exit_list+0x101/0x140
    [ 1360.549236]  unregister_pernet_operations+0x107/0x160
    [ 1360.549245]  unregister_pernet_subsys+0x1c/0x30
    [ 1360.549254]  nf_tables_module_exit+0x43/0x80 [nf_tables]
    [ 1360.549345]  __do_sys_delete_module+0x253/0x370
    [ 1360.549352]  do_syscall_64+0x55/0x120
    [ 1360.549360]  entry_SYSCALL_64_after_hwframe+0x55/0x5d
    
    (gdb) list *__nft_release_table+0x473
    0x1e033 is in __nft_release_table (net/netfilter/nf_tables_api.c:11354).
    11349           list_for_each_entry_safe(flowtable, nf, &table->flowtables, list) {
    11350                   list_del(&flowtable->list);
    11351                   nft_use_dec(&table->use);
    11352                   nf_tables_flowtable_destroy(flowtable);
    11353           }
    11354           list_for_each_entry_safe(set, ns, &table->sets, list) {
    11355                   list_del(&set->list);
    11356                   nft_use_dec(&table->use);
    11357                   if (set->flags & (NFT_SET_MAP | NFT_SET_OBJECT))
    11358                           nft_map_deactivate(&ctx, set);
    (gdb)
    
    [ 1360.549372] Last potentially related work creation:
    [ 1360.549376]  kasan_save_stack+0x20/0x40
    [ 1360.549384]  __kasan_record_aux_stack+0x9b/0xb0
    [ 1360.549392]  __queue_work+0x3fb/0x780
    [ 1360.549399]  queue_work_on+0x4f/0x60
    [ 1360.549407]  nft_rhash_remove+0x33b/0x340 [nf_tables]
    [ 1360.549516]  nf_tables_commit+0x1c6a/0x2620 [nf_tables]
    [ 1360.549625]  nfnetlink_rcv_batch+0x728/0xdc0 [nfnetlink]
    [ 1360.549647]  nfnetlink_rcv+0x1a8/0x1e0 [nfnetlink]
    [ 1360.549671]  netlink_unicast+0x367/0x4f0
    [ 1360.549680]  netlink_sendmsg+0x34b/0x610
    [ 1360.549690]  ____sys_sendmsg+0x4d4/0x510
    [ 1360.549697]  ___sys_sendmsg+0xc9/0x120
    [ 1360.549706]  __sys_sendmsg+0xbe/0x140
    [ 1360.549715]  do_syscall_64+0x55/0x120
    [ 1360.549725]  entry_SYSCALL_64_after_hwframe+0x55/0x5d
    
    Fixes: 0935d5588400 ("netfilter: nf_tables: asynchronous release")
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

netfilter: nf_tables: reject new basechain after table flag update [+ + +]

Author: Pablo Neira Ayuso <pablo@netfilter.org>
Date:   Mon Apr 1 00:33:02 2024 +0200

    netfilter: nf_tables: reject new basechain after table flag update
    
    commit 994209ddf4f430946f6247616b2e33d179243769 upstream.
    
    When dormant flag is toggled, hooks are disabled in the commit phase by
    iterating over current chains in table (existing and new).
    
    The following configuration allows for an inconsistent state:
    
      add table x
      add chain x y { type filter hook input priority 0; }
      add table x { flags dormant; }
      add chain x w { type filter hook input priority 1; }
    
    which triggers the following warning when trying to unregister chain w
    which is already unregistered.
    
    [  127.322252] WARNING: CPU: 7 PID: 1211 at net/netfilter/core.c:50                                                                     1 __nf_unregister_net_hook+0x21a/0x260
    [...]
    [  127.322519] Call Trace:
    [  127.322521]  <TASK>
    [  127.322524]  ? __warn+0x9f/0x1a0
    [  127.322531]  ? __nf_unregister_net_hook+0x21a/0x260
    [  127.322537]  ? report_bug+0x1b1/0x1e0
    [  127.322545]  ? handle_bug+0x3c/0x70
    [  127.322552]  ? exc_invalid_op+0x17/0x40
    [  127.322556]  ? asm_exc_invalid_op+0x1a/0x20
    [  127.322563]  ? kasan_save_free_info+0x3b/0x60
    [  127.322570]  ? __nf_unregister_net_hook+0x6a/0x260
    [  127.322577]  ? __nf_unregister_net_hook+0x21a/0x260
    [  127.322583]  ? __nf_unregister_net_hook+0x6a/0x260
    [  127.322590]  ? __nf_tables_unregister_hook+0x8a/0xe0 [nf_tables]
    [  127.322655]  nft_table_disable+0x75/0xf0 [nf_tables]
    [  127.322717]  nf_tables_commit+0x2571/0x2620 [nf_tables]
    
    Fixes: 179d9ba5559a ("netfilter: nf_tables: fix table flag updates")
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

netfilter: validate user input for expected length [+ + +]

Author: Eric Dumazet <edumazet@google.com>
Date:   Thu Apr 4 12:20:51 2024 +0000

    netfilter: validate user input for expected length
    
    commit 0c83842df40f86e529db6842231154772c20edcc upstream.
    
    I got multiple syzbot reports showing old bugs exposed
    by BPF after commit 20f2505fb436 ("bpf: Try to avoid kzalloc
    in cgroup/{s,g}etsockopt")
    
    setsockopt() @optlen argument should be taken into account
    before copying data.
    
     BUG: KASAN: slab-out-of-bounds in copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
     BUG: KASAN: slab-out-of-bounds in copy_from_sockptr include/linux/sockptr.h:55 [inline]
     BUG: KASAN: slab-out-of-bounds in do_replace net/ipv4/netfilter/ip_tables.c:1111 [inline]
     BUG: KASAN: slab-out-of-bounds in do_ipt_set_ctl+0x902/0x3dd0 net/ipv4/netfilter/ip_tables.c:1627
    Read of size 96 at addr ffff88802cd73da0 by task syz-executor.4/7238
    
    CPU: 1 PID: 7238 Comm: syz-executor.4 Not tainted 6.9.0-rc2-next-20240403-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
    Call Trace:
     <TASK>
      __dump_stack lib/dump_stack.c:88 [inline]
      dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
      print_address_description mm/kasan/report.c:377 [inline]
      print_report+0x169/0x550 mm/kasan/report.c:488
      kasan_report+0x143/0x180 mm/kasan/report.c:601
      kasan_check_range+0x282/0x290 mm/kasan/generic.c:189
      __asan_memcpy+0x29/0x70 mm/kasan/shadow.c:105
      copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
      copy_from_sockptr include/linux/sockptr.h:55 [inline]
      do_replace net/ipv4/netfilter/ip_tables.c:1111 [inline]
      do_ipt_set_ctl+0x902/0x3dd0 net/ipv4/netfilter/ip_tables.c:1627
      nf_setsockopt+0x295/0x2c0 net/netfilter/nf_sockopt.c:101
      do_sock_setsockopt+0x3af/0x720 net/socket.c:2311
      __sys_setsockopt+0x1ae/0x250 net/socket.c:2334
      __do_sys_setsockopt net/socket.c:2343 [inline]
      __se_sys_setsockopt net/socket.c:2340 [inline]
      __x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
     do_syscall_64+0xfb/0x240
     entry_SYSCALL_64_after_hwframe+0x72/0x7a
    RIP: 0033:0x7fd22067dde9
    Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
    RSP: 002b:00007fd21f9ff0c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
    RAX: ffffffffffffffda RBX: 00007fd2207abf80 RCX: 00007fd22067dde9
    RDX: 0000000000000040 RSI: 0000000000000000 RDI: 0000000000000003
    RBP: 00007fd2206ca47a R08: 0000000000000001 R09: 0000000000000000
    R10: 0000000020000880 R11: 0000000000000246 R12: 0000000000000000
    R13: 000000000000000b R14: 00007fd2207abf80 R15: 00007ffd2d0170d8
     </TASK>
    
    Allocated by task 7238:
      kasan_save_stack mm/kasan/common.c:47 [inline]
      kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
      poison_kmalloc_redzone mm/kasan/common.c:370 [inline]
      __kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:387
      kasan_kmalloc include/linux/kasan.h:211 [inline]
      __do_kmalloc_node mm/slub.c:4069 [inline]
      __kmalloc_noprof+0x200/0x410 mm/slub.c:4082
      kmalloc_noprof include/linux/slab.h:664 [inline]
      __cgroup_bpf_run_filter_setsockopt+0xd47/0x1050 kernel/bpf/cgroup.c:1869
      do_sock_setsockopt+0x6b4/0x720 net/socket.c:2293
      __sys_setsockopt+0x1ae/0x250 net/socket.c:2334
      __do_sys_setsockopt net/socket.c:2343 [inline]
      __se_sys_setsockopt net/socket.c:2340 [inline]
      __x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
     do_syscall_64+0xfb/0x240
     entry_SYSCALL_64_after_hwframe+0x72/0x7a
    
    The buggy address belongs to the object at ffff88802cd73da0
     which belongs to the cache kmalloc-8 of size 8
    The buggy address is located 0 bytes inside of
     allocated 1-byte region [ffff88802cd73da0, ffff88802cd73da1)
    
    The buggy address belongs to the physical page:
    page: refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88802cd73020 pfn:0x2cd73
    flags: 0xfff80000000000(node=0|zone=1|lastcpupid=0xfff)
    page_type: 0xffffefff(slab)
    raw: 00fff80000000000 ffff888015041280 dead000000000100 dead000000000122
    raw: ffff88802cd73020 000000008080007f 00000001ffffefff 0000000000000000
    page dumped because: kasan: bad access detected
    page_owner tracks the page as allocated
    page last allocated via order 0, migratetype Unmovable, gfp_mask 0x12cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY), pid 5103, tgid 2119833701 (syz-executor.4), ts 5103, free_ts 70804600828
      set_page_owner include/linux/page_owner.h:32 [inline]
      post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1490
      prep_new_page mm/page_alloc.c:1498 [inline]
      get_page_from_freelist+0x2e7e/0x2f40 mm/page_alloc.c:3454
      __alloc_pages_noprof+0x256/0x6c0 mm/page_alloc.c:4712
      __alloc_pages_node_noprof include/linux/gfp.h:244 [inline]
      alloc_pages_node_noprof include/linux/gfp.h:271 [inline]
      alloc_slab_page+0x5f/0x120 mm/slub.c:2249
      allocate_slab+0x5a/0x2e0 mm/slub.c:2412
      new_slab mm/slub.c:2465 [inline]
      ___slab_alloc+0xcd1/0x14b0 mm/slub.c:3615
      __slab_alloc+0x58/0xa0 mm/slub.c:3705
      __slab_alloc_node mm/slub.c:3758 [inline]
      slab_alloc_node mm/slub.c:3936 [inline]
      __do_kmalloc_node mm/slub.c:4068 [inline]
      kmalloc_node_track_caller_noprof+0x286/0x450 mm/slub.c:4089
      kstrdup+0x3a/0x80 mm/util.c:62
      device_rename+0xb5/0x1b0 drivers/base/core.c:4558
      dev_change_name+0x275/0x860 net/core/dev.c:1232
      do_setlink+0xa4b/0x41f0 net/core/rtnetlink.c:2864
      __rtnl_newlink net/core/rtnetlink.c:3680 [inline]
      rtnl_newlink+0x180b/0x20a0 net/core/rtnetlink.c:3727
      rtnetlink_rcv_msg+0x89b/0x10d0 net/core/rtnetlink.c:6594
      netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2559
      netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
      netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361
    page last free pid 5146 tgid 5146 stack trace:
      reset_page_owner include/linux/page_owner.h:25 [inline]
      free_pages_prepare mm/page_alloc.c:1110 [inline]
      free_unref_page+0xd3c/0xec0 mm/page_alloc.c:2617
      discard_slab mm/slub.c:2511 [inline]
      __put_partials+0xeb/0x130 mm/slub.c:2980
      put_cpu_partial+0x17c/0x250 mm/slub.c:3055
      __slab_free+0x2ea/0x3d0 mm/slub.c:4254
      qlink_free mm/kasan/quarantine.c:163 [inline]
      qlist_free_all+0x9e/0x140 mm/kasan/quarantine.c:179
      kasan_quarantine_reduce+0x14f/0x170 mm/kasan/quarantine.c:286
      __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:322
      kasan_slab_alloc include/linux/kasan.h:201 [inline]
      slab_post_alloc_hook mm/slub.c:3888 [inline]
      slab_alloc_node mm/slub.c:3948 [inline]
      __do_kmalloc_node mm/slub.c:4068 [inline]
      __kmalloc_node_noprof+0x1d7/0x450 mm/slub.c:4076
      kmalloc_node_noprof include/linux/slab.h:681 [inline]
      kvmalloc_node_noprof+0x72/0x190 mm/util.c:634
      bucket_table_alloc lib/rhashtable.c:186 [inline]
      rhashtable_rehash_alloc+0x9e/0x290 lib/rhashtable.c:367
      rht_deferred_worker+0x4e1/0x2440 lib/rhashtable.c:427
      process_one_work kernel/workqueue.c:3218 [inline]
      process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3299
      worker_thread+0x86d/0xd70 kernel/workqueue.c:3380
      kthread+0x2f0/0x390 kernel/kthread.c:388
      ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
      ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
    
    Memory state around the buggy address:
     ffff88802cd73c80: 07 fc fc fc 05 fc fc fc 05 fc fc fc fa fc fc fc
     ffff88802cd73d00: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc
    >ffff88802cd73d80: fa fc fc fc 01 fc fc fc fa fc fc fc fa fc fc fc
                                   ^
     ffff88802cd73e00: fa fc fc fc fa fc fc fc 05 fc fc fc 07 fc fc fc
     ffff88802cd73e80: 07 fc fc fc 07 fc fc fc 07 fc fc fc 07 fc fc fc
    
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Reported-by: syzbot <syzkaller@googlegroups.com>
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
    Link: https://lore.kernel.org/r/20240404122051.2303764-1-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nfc: nci: Fix uninit-value in nci_dev_up and nci_ntf_packet [+ + +]

Author: Ryosuke Yasuoka <ryasuoka@redhat.com>
Date:   Wed Mar 20 09:54:10 2024 +0900

    nfc: nci: Fix uninit-value in nci_dev_up and nci_ntf_packet
    
    [ Upstream commit d24b03535e5eb82e025219c2f632b485409c898f ]
    
    syzbot reported the following uninit-value access issue [1][2]:
    
    nci_rx_work() parses and processes received packet. When the payload
    length is zero, each message type handler reads uninitialized payload
    and KMSAN detects this issue. The receipt of a packet with a zero-size
    payload is considered unexpected, and therefore, such packets should be
    silently discarded.
    
    This patch resolved this issue by checking payload size before calling
    each message type handler codes.
    
    Fixes: 6a2968aaf50c ("NFC: basic NCI protocol implementation")
    Reported-and-tested-by: syzbot+7ea9413ea6749baf5574@syzkaller.appspotmail.com
    Reported-and-tested-by: syzbot+29b5ca705d2e0f4a44d2@syzkaller.appspotmail.com
    Closes: https://syzkaller.appspot.com/bug?extid=7ea9413ea6749baf5574 [1]
    Closes: https://syzkaller.appspot.com/bug?extid=29b5ca705d2e0f4a44d2 [2]
    Signed-off-by: Ryosuke Yasuoka <ryasuoka@redhat.com>
    Reviewed-by: Jeremy Cline <jeremy@jcline.org>
    Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

nfsd: hold a lighter-weight client reference over CB_RECALL_ANY [+ + +]

Author: Jeff Layton <jlayton@kernel.org>
Date:   Fri Apr 5 13:56:18 2024 -0400

    nfsd: hold a lighter-weight client reference over CB_RECALL_ANY
    
    [ Upstream commit 10396f4df8b75ff6ab0aa2cd74296565466f2c8d ]
    
    Currently the CB_RECALL_ANY job takes a cl_rpc_users reference to the
    client. While a callback job is technically an RPC that counter is
    really more for client-driven RPCs, and this has the effect of
    preventing the client from being unhashed until the callback completes.
    
    If nfsd decides to send a CB_RECALL_ANY just as the client reboots, we
    can end up in a situation where the callback can't complete on the (now
    dead) callback channel, but the new client can't connect because the old
    client can't be unhashed. This usually manifests as a NFS4ERR_DELAY
    return on the CREATE_SESSION operation.
    
    The job is only holding a reference to the client so it can clear a flag
    after the RPC completes. Fix this by having CB_RECALL_ANY instead hold a
    reference to the cl_nfsdfs.cl_ref. Typically we only take that sort of
    reference when dealing with the nfsdfs info files, but it should work
    appropriately here to ensure that the nfs4_client doesn't disappear.
    
    Fixes: 44df6f439a17 ("NFSD: add delegation reaper to react to low memory condition")
    Reported-by: Vladimir Benes <vbenes@redhat.com>
    Signed-off-by: Jeff Layton <jlayton@kernel.org>
    Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

nvme: fix miss command type check [+ + +]

Author: min15.li <min15.li@samsung.com>
Date:   Fri May 26 17:06:56 2023 +0000

    nvme: fix miss command type check
    
    commit 31a5978243d24d77be4bacca56c78a0fbc43b00d upstream.
    
    In the function nvme_passthru_end(), only the value of the command
    opcode is checked, without checking the command type (IO command or
    Admin command). When we send a Dataset Management command (The opcode
    of the Dataset Management command is the same as the Set Feature
    command), kernel thinks it is a set feature command, then sets the
    controller's keep alive interval, and calls nvme_keep_alive_work().
    
    Signed-off-by: min15.li <min15.li@samsung.com>
    Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Keith Busch <kbusch@kernel.org>
    Fixes: b58da2d270db ("nvme: update keep alive interval when kato is modified")
    Signed-off-by: Tokunori Ikegami <ikegami.t@gmail.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

octeontx2-af: Add array index check [+ + +]

Author: Aleksandr Mishin <amishin@t-argos.ru>
Date:   Thu Mar 28 19:55:05 2024 +0300

    octeontx2-af: Add array index check
    
    commit ef15ddeeb6bee87c044bf7754fac524545bf71e8 upstream.
    
    In rvu_map_cgx_lmac_pf() the 'iter', which is used as an array index, can reach
    value (up to 14) that exceed the size (MAX_LMAC_COUNT = 8) of the array.
    Fix this bug by adding 'iter' value check.
    
    Found by Linux Verification Center (linuxtesting.org) with SVACE.
    
    Fixes: 91c6945ea1f9 ("octeontx2-af: cn10k: Add RPM MAC support")
    Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

octeontx2-af: Fix issue with loading coalesced KPU profiles [+ + +]

Author: Hariprasad Kelam <hkelam@marvell.com>
Date:   Tue Mar 26 17:51:49 2024 +0530

    octeontx2-af: Fix issue with loading coalesced KPU profiles
    
    commit 0ba80d96585662299d4ea4624043759ce9015421 upstream.
    
    The current implementation for loading coalesced KPU profiles has
    a limitation.  The "offset" field, which is used to locate profiles
    within the profile is restricted to a u16.
    
    This restricts the number of profiles that can be loaded. This patch
    addresses this limitation by increasing the size of the "offset" field.
    
    Fixes: 11c730bfbf5b ("octeontx2-af: support for coalescing KPU profiles")
    Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
    Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Octeontx2-af: fix pause frame configuration in GMP mode [+ + +]

Author: Hariprasad Kelam <hkelam@marvell.com>
Date:   Tue Mar 26 10:57:20 2024 +0530

    Octeontx2-af: fix pause frame configuration in GMP mode
    
    [ Upstream commit 40d4b4807cadd83fb3f46cc8cd67a945b5b25461 ]
    
    The Octeontx2 MAC block (CGX) has separate data paths (SMU and GMP) for
    different speeds, allowing for efficient data transfer.
    
    The previous patch which added pause frame configuration has a bug due
    to which pause frame feature is not working in GMP mode.
    
    This patch fixes the issue by configurating appropriate registers.
    
    Fixes: f7e086e754fe ("octeontx2-af: Pause frame configuration at cgx")
    Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Link: https://lore.kernel.org/r/20240326052720.4441-1-hkelam@marvell.com
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

octeontx2-pf: check negative error code in otx2_open() [+ + +]

Author: Su Hui <suhui@nfschina.com>
Date:   Thu Mar 28 10:06:21 2024 +0800

    octeontx2-pf: check negative error code in otx2_open()
    
    commit e709acbd84fb6ef32736331b0147f027a3ef4c20 upstream.
    
    otx2_rxtx_enable() return negative error code such as -EIO,
    check -EIO rather than EIO to fix this problem.
    
    Fixes: c926252205c4 ("octeontx2-pf: Disable packet I/O for graceful exit")
    Signed-off-by: Su Hui <suhui@nfschina.com>
    Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
    Link: https://lore.kernel.org/r/20240328020620.4054692-1-suhui@nfschina.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

of: dynamic: Synchronize of_changeset_destroy() with the devlink removals [+ + +]

Author: Herve Codina <herve.codina@bootlin.com>
Date:   Mon Mar 25 16:21:26 2024 +0100

    of: dynamic: Synchronize of_changeset_destroy() with the devlink removals
    
    commit 8917e7385346bd6584890ed362985c219fe6ae84 upstream.
    
    In the following sequence:
      1) of_platform_depopulate()
      2) of_overlay_remove()
    
    During the step 1, devices are destroyed and devlinks are removed.
    During the step 2, OF nodes are destroyed but
    __of_changeset_entry_destroy() can raise warnings related to missing
    of_node_put():
      ERROR: memory leak, expected refcount 1 instead of 2 ...
    
    Indeed, during the devlink removals performed at step 1, the removal
    itself releasing the device (and the attached of_node) is done by a job
    queued in a workqueue and so, it is done asynchronously with respect to
    function calls.
    When the warning is present, of_node_put() will be called but wrongly
    too late from the workqueue job.
    
    In order to be sure that any ongoing devlink removals are done before
    the of_node destruction, synchronize the of_changeset_destroy() with the
    devlink removals.
    
    Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal")
    Cc: stable@vger.kernel.org
    Signed-off-by: Herve Codina <herve.codina@bootlin.com>
    Reviewed-by: Saravana Kannan <saravanak@google.com>
    Tested-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
    Reviewed-by: Nuno Sa <nuno.sa@analog.com>
    Link: https://lore.kernel.org/r/20240325152140.198219-3-herve.codina@bootlin.com
    Signed-off-by: Rob Herring <robh@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

perf/x86/amd/lbr: Use freeze based on availability [+ + +]

Author: Sandipan Das <sandipan.das@amd.com>
Date:   Mon Mar 25 13:01:45 2024 +0530

    perf/x86/amd/lbr: Use freeze based on availability
    
    [ Upstream commit 598c2fafc06fe5c56a1a415fb7b544b31453d637 ]
    
    Currently, the LBR code assumes that LBR Freeze is supported on all processors
    when X86_FEATURE_AMD_LBR_V2 is available i.e. CPUID leaf 0x80000022[EAX]
    bit 1 is set. This is incorrect as the availability of the feature is
    additionally dependent on CPUID leaf 0x80000022[EAX] bit 2 being set,
    which may not be set for all Zen 4 processors.
    
    Define a new feature bit for LBR and PMC freeze and set the freeze enable bit
    (FLBRI) in DebugCtl (MSR 0x1d9) conditionally.
    
    It should still be possible to use LBR without freeze for profile-guided
    optimization of user programs by using an user-only branch filter during
    profiling. When the user-only filter is enabled, branches are no longer
    recorded after the transition to CPL 0 upon PMI arrival. When branch
    entries are read in the PMI handler, the branch stack does not change.
    
    E.g.
    
      $ perf record -j any,u -e ex_ret_brn_tkn ./workload
    
    Since the feature bit is visible under flags in /proc/cpuinfo, it can be
    used to determine the feasibility of use-cases which require LBR Freeze
    to be supported by the hardware such as profile-guided optimization of
    kernels.
    
    Fixes: ca5b7c0d9621 ("perf/x86/amd/lbr: Add LbrExtV2 branch record support")
    Signed-off-by: Sandipan Das <sandipan.das@amd.com>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Link: https://lore.kernel.org/r/69a453c97cfd11c6f2584b19f937fe6df741510f.1711091584.git.sandipan.das@amd.com
    Signed-off-by: Sasha Levin <sashal@kernel.org>

r8169: fix issue caused by buggy BIOS on certain boards with RTL8168d [+ + +]

Author: Heiner Kallweit <hkallweit1@gmail.com>
Date:   Sat Mar 30 12:49:02 2024 +0100

    r8169: fix issue caused by buggy BIOS on certain boards with RTL8168d
    
    commit 5d872c9f46bd2ea3524af3c2420a364a13667135 upstream.
    
    On some boards with this chip version the BIOS is buggy and misses
    to reset the PHY page selector. This results in the PHY ID read
    accessing registers on a different page, returning a more or
    less random value. Fix this by resetting the page selector first.
    
    Fixes: f1e911d5d0df ("r8169: add basic phylib support")
    Cc: stable@vger.kernel.org
    Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Link: https://lore.kernel.org/r/64f2055e-98b8-45ec-8568-665e3d54d4e6@gmail.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

r8169: prepare rtl_hw_aspm_clkreq_enable for usage in atomic context [+ + +]

Author: Heiner Kallweit <hkallweit1@gmail.com>
Date:   Mon Mar 6 22:25:49 2023 +0100

    r8169: prepare rtl_hw_aspm_clkreq_enable for usage in atomic context
    
    [ Upstream commit 49ef7d846d4bd77b0b9f1f801fc765b004690a07 ]
    
    Bail out if the function is used with chip versions that don't support
    ASPM configuration. In addition remove the delay, it tuned out that
    it's not needed, also vendor driver r8125 doesn't have it.
    
    Suggested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
    Reviewed-by: Simon Horman <simon.horman@corigine.com>
    Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
    Tested-by: Holger Hoffstц╓tte <holger@applied-asynchrony.com>
    Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Stable-dep-of: 5e864d90b208 ("r8169: skip DASH fw status checks when DASH is disabled")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

r8169: skip DASH fw status checks when DASH is disabled [+ + +]

Author: Atlas Yu <atlas.yu@canonical.com>
Date:   Thu Mar 28 13:51:52 2024 +0800

    r8169: skip DASH fw status checks when DASH is disabled
    
    commit 5e864d90b20803edf6bd44a99fb9afa7171785f2 upstream.
    
    On devices that support DASH, the current code in the "rtl_loop_wait" function
    raises false alarms when DASH is disabled. This occurs because the function
    attempts to wait for the DASH firmware to be ready, even though it's not
    relevant in this case.
    
    r8169 0000:0c:00.0 eth0: RTL8168ep/8111ep, 38:7c:76:49:08:d9, XID 502, IRQ 86
    r8169 0000:0c:00.0 eth0: jumbo features [frames: 9194 bytes, tx checksumming: ko]
    r8169 0000:0c:00.0 eth0: DASH disabled
    ...
    r8169 0000:0c:00.0 eth0: rtl_ep_ocp_read_cond == 0 (loop: 30, delay: 10000).
    
    This patch modifies the driver start/stop functions to skip checking the DASH
    firmware status when DASH is explicitly disabled. This prevents unnecessary
    delays and false alarms.
    
    The patch has been tested on several ThinkStation P8/PX workstations.
    
    Fixes: 0ab0c45d8aae ("r8169: add handling DASH when DASH is disabled")
    Signed-off-by: Atlas Yu <atlas.yu@canonical.com>
    Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
    Link: https://lore.kernel.org/r/20240328055152.18443-1-atlas.yu@canonical.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

r8169: use spinlock to protect access to registers Config2 and Config5 [+ + +]

Author: Heiner Kallweit <hkallweit1@gmail.com>
Date:   Mon Mar 6 22:24:00 2023 +0100

    r8169: use spinlock to protect access to registers Config2 and Config5
    
    [ Upstream commit 6bc6c4e6893ee79a9862c61d1635e7da6d5a3333 ]
    
    For disabling ASPM during NAPI poll we'll have to access both registers
    in atomic context. Use a spinlock to protect access.
    
    Reviewed-by: Simon Horman <simon.horman@corigine.com>
    Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
    Tested-by: Holger Hoffstц╓tte <holger@applied-asynchrony.com>
    Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Stable-dep-of: 5e864d90b208 ("r8169: skip DASH fw status checks when DASH is disabled")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

r8169: use spinlock to protect mac ocp register access [+ + +]

Author: Heiner Kallweit <hkallweit1@gmail.com>
Date:   Mon Mar 6 22:23:15 2023 +0100

    r8169: use spinlock to protect mac ocp register access
    
    [ Upstream commit 91c8643578a21e435c412ffbe902bb4b4773e262 ]
    
    For disabling ASPM during NAPI poll we'll have to access mac ocp
    registers in atomic context. This could result in races because
    a mac ocp read consists of a write to register OCPDR, followed
    by a read from the same register. Therefore add a spinlock to
    protect access to mac ocp registers.
    
    Reviewed-by: Simon Horman <simon.horman@corigine.com>
    Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
    Tested-by: Holger Hoffstц╓tte <holger@applied-asynchrony.com>
    Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Stable-dep-of: 5e864d90b208 ("r8169: skip DASH fw status checks when DASH is disabled")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

Revert "Bluetooth: hci_qca: Set BDA quirk bit if fwnode exists in DT" [+ + +]

Author: Johan Hovold <johan+linaro@kernel.org>
Date:   Thu Mar 14 09:44:12 2024 +0100

    Revert "Bluetooth: hci_qca: Set BDA quirk bit if fwnode exists in DT"
    
    commit 4790a73ace86f3d165bbedba898e0758e6e1b82d upstream.
    
    This reverts commit 7dcd3e014aa7faeeaf4047190b22d8a19a0db696.
    
    Qualcomm Bluetooth controllers like WCN6855 do not have persistent
    storage for the Bluetooth address and must therefore start as
    unconfigured to allow the user to set a valid address unless one has
    been provided by the boot firmware in the devicetree.
    
    A recent change snuck into v6.8-rc7 and incorrectly started marking the
    default (non-unique) address as valid. This specifically also breaks the
    Bluetooth setup for some user of the Lenovo ThinkPad X13s.
    
    Note that this is the second time Qualcomm breaks the driver this way
    and that this was fixed last year by commit 6945795bc81a ("Bluetooth:
    fix use-bdaddr-property quirk"), which also has some further details.
    
    Fixes: 7dcd3e014aa7 ("Bluetooth: hci_qca: Set BDA quirk bit if fwnode exists in DT")
    Cc: stable@vger.kernel.org      # 6.8
    Cc: Janaki Ramaiah Thota <quic_janathot@quicinc.com>
    Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
    Reported-by: Clayton Craft <clayton@craftyguy.net>
    Tested-by: Clayton Craft <clayton@craftyguy.net>
    Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Revert "x86/mm/ident_map: Use gbpages only where full GB page should be mapped." [+ + +]

Author: Ingo Molnar <mingo@kernel.org>
Date:   Mon Mar 25 11:47:51 2024 +0100

    Revert "x86/mm/ident_map: Use gbpages only where full GB page should be mapped."
    
    commit c567f2948f57bdc03ed03403ae0234085f376b7d upstream.
    
    This reverts commit d794734c9bbfe22f86686dc2909c25f5ffe1a572.
    
    While the original change tries to fix a bug, it also unintentionally broke
    existing systems, see the regressions reported at:
    
      https://lore.kernel.org/all/3a1b9909-45ac-4f97-ad68-d16ef1ce99db@pavinjoseph.com/
    
    Since d794734c9bbf was also marked for -stable, let's back it out before
    causing more damage.
    
    Note that due to another upstream change the revert was not 100% automatic:
    
      0a845e0f6348 mm/treewide: replace pud_large() with pud_leaf()
    
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Cc: <stable@vger.kernel.org>
    Cc: Russ Anderson <rja@hpe.com>
    Cc: Steve Wahl <steve.wahl@hpe.com>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Link: https://lore.kernel.org/all/3a1b9909-45ac-4f97-ad68-d16ef1ce99db@pavinjoseph.com/
    Fixes: d794734c9bbf ("x86/mm/ident_map: Use gbpages only where full GB page should be mapped.")
    Signed-off-by: Steve Wahl <steve.wahl@hpe.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

riscv: Fix spurious errors from __get/put_kernel_nofault [+ + +]

Author: Samuel Holland <samuel.holland@sifive.com>
Date:   Mon Mar 11 19:19:13 2024 -0700

    riscv: Fix spurious errors from __get/put_kernel_nofault
    
    commit d080a08b06b6266cc3e0e86c5acfd80db937cb6b upstream.
    
    These macros did not initialize __kr_err, so they could fail even if
    the access did not fault.
    
    Cc: stable@vger.kernel.org
    Fixes: d464118cdc41 ("riscv: implement __get_kernel_nofault and __put_user_nofault")
    Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
    Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
    Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
    Link: https://lore.kernel.org/r/20240312022030.320789-1-samuel.holland@sifive.com
    Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

riscv: process: Fix kernel gp leakage [+ + +]

Author: Stefan O'Rear <sorear@fastmail.com>
Date:   Wed Mar 27 02:12:58 2024 -0400

    riscv: process: Fix kernel gp leakage
    
    commit d14fa1fcf69db9d070e75f1c4425211fa619dfc8 upstream.
    
    childregs represents the registers which are active for the new thread
    in user context. For a kernel thread, childregs->gp is never used since
    the kernel gp is not touched by switch_to. For a user mode helper, the
    gp value can be observed in user space after execve or possibly by other
    means.
    
    [From the email thread]
    
    The /* Kernel thread */ comment is somewhat inaccurate in that it is also used
    for user_mode_helper threads, which exec a user process, e.g. /sbin/init or
    when /proc/sys/kernel/core_pattern is a pipe. Such threads do not have
    PF_KTHREAD set and are valid targets for ptrace etc. even before they exec.
    
    childregs is the *user* context during syscall execution and it is observable
    from userspace in at least five ways:
    
    1. kernel_execve does not currently clear integer registers, so the starting
       register state for PID 1 and other user processes started by the kernel has
       sp = user stack, gp = kernel __global_pointer$, all other integer registers
       zeroed by the memset in the patch comment.
    
       This is a bug in its own right, but I'm unwilling to bet that it is the only
       way to exploit the issue addressed by this patch.
    
    2. ptrace(PTRACE_GETREGSET): you can PTRACE_ATTACH to a user_mode_helper thread
       before it execs, but ptrace requires SIGSTOP to be delivered which can only
       happen at user/kernel boundaries.
    
    3. /proc/*/task/*/syscall: this is perfectly happy to read pt_regs for
       user_mode_helpers before the exec completes, but gp is not one of the
       registers it returns.
    
    4. PERF_SAMPLE_REGS_USER: LOCKDOWN_PERF normally prevents access to kernel
       addresses via PERF_SAMPLE_REGS_INTR, but due to this bug kernel addresses
       are also exposed via PERF_SAMPLE_REGS_USER which is permitted under
       LOCKDOWN_PERF. I have not attempted to write exploit code.
    
    5. Much of the tracing infrastructure allows access to user registers. I have
       not attempted to determine which forms of tracing allow access to user
       registers without already allowing access to kernel registers.
    
    Fixes: 7db91e57a0ac ("RISC-V: Task implementation")
    Cc: stable@vger.kernel.org
    Signed-off-by: Stefan O'Rear <sorear@fastmail.com>
    Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
    Link: https://lore.kernel.org/r/20240327061258.2370291-1-sorear@fastmail.com
    Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

s390/entry: align system call table on 8 bytes [+ + +]

Author: Sumanth Korikkar <sumanthk@linux.ibm.com>
Date:   Tue Mar 26 18:12:13 2024 +0100

    s390/entry: align system call table on 8 bytes
    
    commit 378ca2d2ad410a1cd5690d06b46c5e2297f4c8c0 upstream.
    
    Align system call table on 8 bytes. With sys_call_table entry size
    of 8 bytes that eliminates the possibility of a system call pointer
    crossing cache line boundary.
    
    Cc: stable@kernel.org
    Suggested-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
    Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
    Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
    Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

s390/qeth: handle deferred cc1 [+ + +]

Author: Alexandra Winter <wintera@linux.ibm.com>
Date:   Thu Mar 21 12:53:37 2024 +0100

    s390/qeth: handle deferred cc1
    
    [ Upstream commit afb373ff3f54c9d909efc7f810dc80a9742807b2 ]
    
    The IO subsystem expects a driver to retry a ccw_device_start, when the
    subsequent interrupt response block (irb) contains a deferred
    condition code 1.
    
    Symptoms before this commit:
    On the read channel we always trigger the next read anyhow, so no
    different behaviour here.
    On the write channel we may experience timeout errors, because the
    expected reply will never be received without the retry.
    Other callers of qeth_send_control_data() may wrongly assume that the ccw
    was successful, which may cause problems later.
    
    Note that since
    commit 2297791c92d0 ("s390/cio: dont unregister subchannel from child-drivers")
    and
    commit 5ef1dc40ffa6 ("s390/cio: fix invalid -EBUSY on ccw_device_start")
    deferred CC1s are much more likely to occur. See the commit message of the
    latter for more background information.
    
    Fixes: 2297791c92d0 ("s390/cio: dont unregister subchannel from child-drivers")
    Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
    Co-developed-by: Thorsten Winkler <twinkler@linux.ibm.com>
    Signed-off-by: Thorsten Winkler <twinkler@linux.ibm.com>
    Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
    Link: https://lore.kernel.org/r/20240321115337.3564694-1-wintera@linux.ibm.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

scripts/bpf_doc: Use silent mode when exec make cmd [+ + +]

Author: Hangbin Liu <liuhangbin@gmail.com>
Date:   Fri Mar 15 10:34:43 2024 +0800

    scripts/bpf_doc: Use silent mode when exec make cmd
    
    [ Upstream commit 5384cc0d1a88c27448a6a4e65b8abe6486de8012 ]
    
    When getting kernel version via make, the result may be polluted by other
    output, like directory change info. e.g.
    
      $ export MAKEFLAGS="-w"
      $ make kernelversion
      make: Entering directory '/home/net'
      6.8.0
      make: Leaving directory '/home/net'
    
    This will distort the reStructuredText output and make latter rst2man
    failed like:
    
      [...]
      bpf-helpers.rst:20: (WARNING/2) Field list ends without a blank line; unexpected unindent.
      [...]
    
    Using silent mode would help. e.g.
    
      $ make -s --no-print-directory kernelversion
      6.8.0
    
    Fixes: fd0a38f9c37d ("scripts/bpf: Set version attribute for bpf-helpers(7) man page")
    Signed-off-by: Michael Hofmann <mhofmann@redhat.com>
    Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Reviewed-by: Quentin Monnet <qmo@kernel.org>
    Acked-by: Alejandro Colomar <alx@kernel.org>
    Link: https://lore.kernel.org/bpf/20240315023443.2364442-1-liuhangbin@gmail.com
    Signed-off-by: Sasha Levin <sashal@kernel.org>

scsi: mylex: Fix sysfs buffer lengths [+ + +]

Author: Arnd Bergmann <arnd@arndb.de>
Date:   Tue Mar 26 23:38:06 2024 +0100

    scsi: mylex: Fix sysfs buffer lengths
    
    [ Upstream commit 1197c5b2099f716b3de327437fb50900a0b936c9 ]
    
    The myrb and myrs drivers use an odd way of implementing their sysfs files,
    calling snprintf() with a fixed length of 32 bytes to print into a page
    sized buffer. One of the strings is actually longer than 32 bytes, which
    clang can warn about:
    
    drivers/scsi/myrb.c:1906:10: error: 'snprintf' will always be truncated; specified size is 32, but format string expands to at least 34 [-Werror,-Wformat-truncation]
    drivers/scsi/myrs.c:1089:10: error: 'snprintf' will always be truncated; specified size is 32, but format string expands to at least 34 [-Werror,-Wformat-truncation]
    
    These could all be plain sprintf() without a length as the buffer is always
    long enough. On the other hand, sysfs files should not be overly long
    either, so just double the length to make sure the longest strings don't
    get truncated here.
    
    Fixes: 77266186397c ("scsi: myrs: Add Mylex RAID controller (SCSI interface)")
    Fixes: 081ff398c56c ("scsi: myrb: Add Mylex RAID controller (block interface)")
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>
    Link: https://lore.kernel.org/r/20240326223825.4084412-8-arnd@kernel.org
    Reviewed-by: Hannes Reinecke <hare@suse.de>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

scsi: sd: Unregister device if device_add_disk() failed in sd_probe() [+ + +]

Author: Li Nan <linan122@huawei.com>
Date:   Fri Dec 8 16:23:35 2023 +0800

    scsi: sd: Unregister device if device_add_disk() failed in sd_probe()
    
    [ Upstream commit 0296bea01cfa6526be6bd2d16dc83b4e7f1af91f ]
    
    "if device_add() succeeds, you should call device_del() when you want to
    get rid of it."
    
    In sd_probe(), device_add_disk() fails when device_add() has already
    succeeded, so change put_device() to device_unregister() to ensure device
    resources are released.
    
    Fixes: 2a7a891f4c40 ("scsi: sd: Add error handling support for add_disk()")
    Signed-off-by: Li Nan <linan122@huawei.com>
    Link: https://lore.kernel.org/r/20231208082335.1754205-1-linan666@huaweicloud.com
    Reviewed-by: Bart Van Assche <bvanassche@acm.org>
    Reviewed-by: Yu Kuai <yukuai3@huawei.com>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

selftests: mptcp: display simult in extra_msg [+ + +]

Author: Geliang Tang <geliang@kernel.org>
Date:   Wed Oct 25 16:37:11 2023 -0700

    selftests: mptcp: display simult in extra_msg
    
    commit 629b35a225b0d49fbcff3b5c22e3b983c7c7b36f upstream.
    
    Just like displaying "invert" after "Info: ", "simult" should be
    displayed too when rm_subflow_nr doesn't match the expect value in
    chk_rm_nr():
    
          syn                                 [ ok ]
          synack                              [ ok ]
          ack                                 [ ok ]
          add                                 [ ok ]
          echo                                [ ok ]
          rm                                  [ ok ]
          rmsf                                [ ok ] 3 in [2:4]
          Info: invert simult
    
          syn                                 [ ok ]
          synack                              [ ok ]
          ack                                 [ ok ]
          add                                 [ ok ]
          echo                                [ ok ]
          rm                                  [ ok ]
          rmsf                                [ ok ]
          Info: invert
    
    Reviewed-by: Matthieu Baerts <matttbe@kernel.org>
    Signed-off-by: Geliang Tang <geliang.tang@suse.com>
    Signed-off-by: Mat Martineau <martineau@kernel.org>
    Link: https://lore.kernel.org/r/20231025-send-net-next-20231025-v1-10-db8f25f798eb@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

selftests: mptcp: join: fix dev in check_endpoint [+ + +]

Author: Geliang Tang <tanggeliang@kylinos.cn>
Date:   Fri Mar 29 13:08:53 2024 +0100

    selftests: mptcp: join: fix dev in check_endpoint
    
    commit 40061817d95bce6dd5634a61a65cd5922e6ccc92 upstream.
    
    There's a bug in pm_nl_check_endpoint(), 'dev' didn't be parsed correctly.
    If calling it in the 2nd test of endpoint_tests() too, it fails with an
    error like this:
    
     creation  [FAIL] expected '10.0.2.2 id 2 subflow dev dev' \
                         found '10.0.2.2 id 2 subflow dev ns2eth2'
    
    The reason is '$2' should be set to 'dev', not '$1'. This patch fixes it.
    
    Fixes: 69c6ce7b6eca ("selftests: mptcp: add implicit endpoint test case")
    Cc: stable@vger.kernel.org
    Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
    Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Link: https://lore.kernel.org/r/20240329-upstream-net-20240329-fallback-mib-v1-2-324a8981da48@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    [ Conflicts in mptcp_join.sh: only the fix has been added, not the
      verification because this modified subtest is quite different in
      v6.1: to add this verification, we would need to change a bit the
      subtest: pm_nl_check_endpoint() takes an extra argument for the
      title, the next chk_subflow_nr() will no longer need the title, etc.
      Easier with only the fix without the extra test. ]
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

selftests: net: gro fwd: update vxlan GRO test expectations [+ + +]

Author: Antoine Tenart <atenart@kernel.org>
Date:   Tue Mar 26 12:34:02 2024 +0100

    selftests: net: gro fwd: update vxlan GRO test expectations
    
    commit 0fb101be97ca27850c5ecdbd1269423ce4d1f607 upstream.
    
    UDP tunnel packets can't be GRO in-between their endpoints as this
    causes different issues. The UDP GRO fwd vxlan tests were relying on
    this and their expectations have to be fixed.
    
    We keep both vxlan tests and expected no GRO from happening. The vxlan
    UDP GRO bench test was removed as it's not providing any valuable
    information now.
    
    Fixes: a062260a9d5f ("selftests: net: add UDP GRO forwarding self-tests")
    Signed-off-by: Antoine Tenart <atenart@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

selftests: reuseaddr_conflict: add missing new line at the end of the output [+ + +]

Author: Jakub Kicinski <kuba@kernel.org>
Date:   Fri Mar 29 09:05:59 2024 -0700

    selftests: reuseaddr_conflict: add missing new line at the end of the output
    
    commit 31974122cfdeaf56abc18d8ab740d580d9833e90 upstream.
    
    The netdev CI runs in a VM and captures serial, so stdout and
    stderr get combined. Because there's a missing new line in
    stderr the test ends up corrupting KTAP:
    
      # Successok 1 selftests: net: reuseaddr_conflict
    
    which should have been:
    
      # Success
      ok 1 selftests: net: reuseaddr_conflict
    
    Fixes: 422d8dc6fd3a ("selftest: add a reuseaddr test")
    Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
    Link: https://lore.kernel.org/r/20240329160559.249476-1-kuba@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

smb3: retrying on failed server close [+ + +]

Author: Ritvik Budhiraja <rbudhiraja@microsoft.com>
Date:   Tue Apr 2 14:01:28 2024 -0500

    smb3: retrying on failed server close
    
    commit 173217bd73365867378b5e75a86f0049e1069ee8 upstream.
    
    In the current implementation, CIFS close sends a close to the
    server and does not check for the success of the server close.
    This patch adds functionality to check for server close return
    status and retries in case of an EBUSY or EAGAIN error.
    
    This can help avoid handle leaks
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Ritvik Budhiraja <rbudhiraja@microsoft.com>
    Signed-off-by: Steve French <stfrench@microsoft.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

smb: client: fix potential UAF in cifs_debug_files_proc_show() [+ + +]

Author: Paulo Alcantara <pc@manguebit.com>
Date:   Tue Apr 2 16:33:53 2024 -0300

    smb: client: fix potential UAF in cifs_debug_files_proc_show()
    
    commit ca545b7f0823f19db0f1148d59bc5e1a56634502 upstream.
    
    Skip sessions that are being teared down (status == SES_EXITING) to
    avoid UAF.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
    Signed-off-by: Steve French <stfrench@microsoft.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

smb: client: fix potential UAF in cifs_signal_cifsd_for_reconnect() [+ + +]

Author: Paulo Alcantara <pc@manguebit.com>
Date:   Tue Apr 2 16:34:04 2024 -0300

    smb: client: fix potential UAF in cifs_signal_cifsd_for_reconnect()
    
    commit e0e50401cc3921c9eaf1b0e667db174519ea939f upstream.
    
    Skip sessions that are being teared down (status == SES_EXITING) to
    avoid UAF.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
    Signed-off-by: Steve French <stfrench@microsoft.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

smb: client: fix potential UAF in cifs_stats_proc_show() [+ + +]

Author: Paulo Alcantara <pc@manguebit.com>
Date:   Tue Apr 2 16:33:56 2024 -0300

    smb: client: fix potential UAF in cifs_stats_proc_show()
    
    commit 0865ffefea197b437ba78b5dd8d8e256253efd65 upstream.
    
    Skip sessions that are being teared down (status == SES_EXITING) to
    avoid UAF.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
    Signed-off-by: Steve French <stfrench@microsoft.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

smb: client: fix potential UAF in cifs_stats_proc_write() [+ + +]

Author: Paulo Alcantara <pc@manguebit.com>
Date:   Tue Apr 2 16:33:55 2024 -0300

    smb: client: fix potential UAF in cifs_stats_proc_write()
    
    commit d3da25c5ac84430f89875ca7485a3828150a7e0a upstream.
    
    Skip sessions that are being teared down (status == SES_EXITING) to
    avoid UAF.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
    Signed-off-by: Steve French <stfrench@microsoft.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

smb: client: fix potential UAF in is_valid_oplock_break() [+ + +]

Author: Paulo Alcantara <pc@manguebit.com>
Date:   Tue Apr 2 16:34:00 2024 -0300

    smb: client: fix potential UAF in is_valid_oplock_break()
    
    commit 69ccf040acddf33a3a85ec0f6b45ef84b0f7ec29 upstream.
    
    Skip sessions that are being teared down (status == SES_EXITING) to
    avoid UAF.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
    Signed-off-by: Steve French <stfrench@microsoft.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

smb: client: fix potential UAF in smb2_is_network_name_deleted() [+ + +]

Author: Paulo Alcantara <pc@manguebit.com>
Date:   Tue Apr 2 16:34:02 2024 -0300

    smb: client: fix potential UAF in smb2_is_network_name_deleted()
    
    commit 63981561ffd2d4987807df4126f96a11e18b0c1d upstream.
    
    Skip sessions that are being teared down (status == SES_EXITING) to
    avoid UAF.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
    Signed-off-by: Steve French <stfrench@microsoft.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

smb: client: fix potential UAF in smb2_is_valid_lease_break() [+ + +]

Author: Paulo Alcantara <pc@manguebit.com>
Date:   Tue Apr 2 16:33:58 2024 -0300

    smb: client: fix potential UAF in smb2_is_valid_lease_break()
    
    commit 705c76fbf726c7a2f6ff9143d4013b18daaaebf1 upstream.
    
    Skip sessions that are being teared down (status == SES_EXITING) to
    avoid UAF.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
    Signed-off-by: Steve French <stfrench@microsoft.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

smb: client: fix potential UAF in smb2_is_valid_oplock_break() [+ + +]

Author: Paulo Alcantara <pc@manguebit.com>
Date:   Tue Apr 2 16:33:59 2024 -0300

    smb: client: fix potential UAF in smb2_is_valid_oplock_break()
    
    commit 22863485a4626ec6ecf297f4cc0aef709bc862e4 upstream.
    
    Skip sessions that are being teared down (status == SES_EXITING) to
    avoid UAF.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
    Signed-off-by: Steve French <stfrench@microsoft.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tcp: Fix bind() regression for v6-only wildcard and v4(-mapped-v6) non-wildcard addresses. [+ + +]

Author: Kuniyuki Iwashima <kuniyu@amazon.com>
Date:   Tue Mar 26 13:42:45 2024 -0700

    tcp: Fix bind() regression for v6-only wildcard and v4(-mapped-v6) non-wildcard addresses.
    
    [ Upstream commit d91ef1e1b55f730bee8ce286b02b7bdccbc42973 ]
    
    Jianguo Wu reported another bind() regression introduced by bhash2.
    
    Calling bind() for the following 3 addresses on the same port, the
    3rd one should fail but now succeeds.
    
      1. 0.0.0.0 or ::ffff:0.0.0.0
      2. [::] w/ IPV6_V6ONLY
      3. IPv4 non-wildcard address or v4-mapped-v6 non-wildcard address
    
    The first two bind() create tb2 like this:
    
      bhash2 -> tb2(:: w/ IPV6_V6ONLY) -> tb2(0.0.0.0)
    
    The 3rd bind() will match with the IPv6 only wildcard address bucket
    in inet_bind2_bucket_match_addr_any(), however, no conflicting socket
    exists in the bucket.  So, inet_bhash2_conflict() will returns false,
    and thus, inet_bhash2_addr_any_conflict() returns false consequently.
    
    As a result, the 3rd bind() bypasses conflict check, which should be
    done against the IPv4 wildcard address bucket.
    
    So, in inet_bhash2_addr_any_conflict(), we must iterate over all buckets.
    
    Note that we cannot add ipv6_only flag for inet_bind2_bucket as it
    would confuse the following patetrn.
    
      1. [::] w/ SO_REUSE{ADDR,PORT} and IPV6_V6ONLY
      2. [::] w/ SO_REUSE{ADDR,PORT}
      3. IPv4 non-wildcard address or v4-mapped-v6 non-wildcard address
    
    The first bind() would create a bucket with ipv6_only flag true,
    the second bind() would add the [::] socket into the same bucket,
    and the third bind() could succeed based on the wrong assumption
    that ipv6_only bucket would not conflict with v4(-mapped-v6) address.
    
    Fixes: 28044fc1d495 ("net: Add a bhash2 table hashed by port and address")
    Diagnosed-by: Jianguo Wu <wujianguo106@163.com>
    Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Link: https://lore.kernel.org/r/20240326204251.51301-3-kuniyu@amazon.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

tcp: properly terminate timers for kernel sockets [+ + +]

Author: Eric Dumazet <edumazet@google.com>
Date:   Fri Mar 22 13:57:32 2024 +0000

    tcp: properly terminate timers for kernel sockets
    
    [ Upstream commit 151c9c724d05d5b0dd8acd3e11cb69ef1f2dbada ]
    
    We had various syzbot reports about tcp timers firing after
    the corresponding netns has been dismantled.
    
    Fortunately Josef Bacik could trigger the issue more often,
    and could test a patch I wrote two years ago.
    
    When TCP sockets are closed, we call inet_csk_clear_xmit_timers()
    to 'stop' the timers.
    
    inet_csk_clear_xmit_timers() can be called from any context,
    including when socket lock is held.
    This is the reason it uses sk_stop_timer(), aka del_timer().
    This means that ongoing timers might finish much later.
    
    For user sockets, this is fine because each running timer
    holds a reference on the socket, and the user socket holds
    a reference on the netns.
    
    For kernel sockets, we risk that the netns is freed before
    timer can complete, because kernel sockets do not hold
    reference on the netns.
    
    This patch adds inet_csk_clear_xmit_timers_sync() function
    that using sk_stop_timer_sync() to make sure all timers
    are terminated before the kernel socket is released.
    Modules using kernel sockets close them in their netns exit()
    handler.
    
    Also add sock_not_owned_by_me() helper to get LOCKDEP
    support : inet_csk_clear_xmit_timers_sync() must not be called
    while socket lock is held.
    
    It is very possible we can revert in the future commit
    3a58f13a881e ("net: rds: acquire refcount on TCP sockets")
    which attempted to solve the issue in rds only.
    (net/smc/af_smc.c and net/mptcp/subflow.c have similar code)
    
    We probably can remove the check_net() tests from
    tcp_out_of_resources() and __tcp_close() in the future.
    
    Reported-by: Josef Bacik <josef@toxicpanda.com>
    Closes: https://lore.kernel.org/netdev/20240314210740.GA2823176@perftesting/
    Fixes: 26abe14379f8 ("net: Modify sk_alloc to not reference count the netns of kernel sockets.")
    Fixes: 8a68173691f0 ("net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket")
    Link: https://lore.kernel.org/bpf/CANn89i+484ffqb93aQm1N-tjxxvb3WDKX0EbD7318RwRgsatjw@mail.gmail.com/
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Tested-by: Josef Bacik <josef@toxicpanda.com>
    Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
    Link: https://lore.kernel.org/r/20240322135732.1535772-1-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

tls: adjust recv return with async crypto and failed copy to userspace [+ + +]

Author: Sabrina Dubroca <sd@queasysnail.net>
Date:   Mon Mar 25 16:56:46 2024 +0100

    tls: adjust recv return with async crypto and failed copy to userspace
    
    [ Upstream commit 85eef9a41d019b59be7bc91793f26251909c0710 ]
    
    process_rx_list may not copy as many bytes as we want to the userspace
    buffer, for example in case we hit an EFAULT during the copy. If this
    happens, we should only count the bytes that were actually copied,
    which may be 0.
    
    Subtracting async_copy_bytes is correct in both peek and !peek cases,
    because decrypted == async_copy_bytes + peeked for the peek case: peek
    is always !ZC, and we can go through either the sync or async path. In
    the async case, we add chunk to both decrypted and
    async_copy_bytes. In the sync case, we add chunk to both decrypted and
    peeked. I missed that in commit 6caaf104423d ("tls: fix peeking with
    sync+async decryption").
    
    Fixes: 4d42cd6bc2ac ("tls: rx: fix return value for async crypto")
    Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Link: https://lore.kernel.org/r/1b5a1eaab3c088a9dd5d9f1059ceecd7afe888d1.1711120964.git.sd@queasysnail.net
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

tls: get psock ref after taking rxlock to avoid leak [+ + +]

Author: Sabrina Dubroca <sd@queasysnail.net>
Date:   Mon Mar 25 16:56:48 2024 +0100

    tls: get psock ref after taking rxlock to avoid leak
    
    [ Upstream commit 417e91e856099e9b8a42a2520e2255e6afe024be ]
    
    At the start of tls_sw_recvmsg, we take a reference on the psock, and
    then call tls_rx_reader_lock. If that fails, we return directly
    without releasing the reference.
    
    Instead of adding a new label, just take the reference after locking
    has succeeded, since we don't need it before.
    
    Fixes: 4cbc325ed6b4 ("tls: rx: allow only one reader at a time")
    Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Link: https://lore.kernel.org/r/fe2ade22d030051ce4c3638704ed58b67d0df643.1711120964.git.sd@queasysnail.net
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

tls: recv: process_rx_list shouldn't use an offset with kvec [+ + +]

Author: Sabrina Dubroca <sd@queasysnail.net>
Date:   Mon Mar 25 16:56:45 2024 +0100

    tls: recv: process_rx_list shouldn't use an offset with kvec
    
    [ Upstream commit 7608a971fdeb4c3eefa522d1bfe8d4bc6b2481cc ]
    
    Only MSG_PEEK needs to copy from an offset during the final
    process_rx_list call, because the bytes we copied at the beginning of
    tls_sw_recvmsg were left on the rx_list. In the KVEC case, we removed
    data from the rx_list as we were copying it, so there's no need to use
    an offset, just like in the normal case.
    
    Fixes: 692d7b5d1f91 ("tls: Fix recvmsg() to be able to peek across multiple records")
    Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Link: https://lore.kernel.org/r/e5487514f828e0347d2b92ca40002c62b58af73d.1711120964.git.sd@queasysnail.net
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

udp: do not accept non-tunnel GSO skbs landing in a tunnel [+ + +]

Author: Antoine Tenart <atenart@kernel.org>
Date:   Tue Mar 26 12:33:58 2024 +0100

    udp: do not accept non-tunnel GSO skbs landing in a tunnel
    
    commit 3d010c8031e39f5fa1e8b13ada77e0321091011f upstream.
    
    When rx-udp-gro-forwarding is enabled UDP packets might be GROed when
    being forwarded. If such packets might land in a tunnel this can cause
    various issues and udp_gro_receive makes sure this isn't the case by
    looking for a matching socket. This is performed in
    udp4/6_gro_lookup_skb but only in the current netns. This is an issue
    with tunneled packets when the endpoint is in another netns. In such
    cases the packets will be GROed at the UDP level, which leads to various
    issues later on. The same thing can happen with rx-gro-list.
    
    We saw this with geneve packets being GROed at the UDP level. In such
    case gso_size is set; later the packet goes through the geneve rx path,
    the geneve header is pulled, the offset are adjusted and frag_list skbs
    are not adjusted with regard to geneve. When those skbs hit
    skb_fragment, it will misbehave. Different outcomes are possible
    depending on what the GROed skbs look like; from corrupted packets to
    kernel crashes.
    
    One example is a BUG_ON[1] triggered in skb_segment while processing the
    frag_list. Because gso_size is wrong (geneve header was pulled)
    skb_segment thinks there is "geneve header size" of data in frag_list,
    although it's in fact the next packet. The BUG_ON itself has nothing to
    do with the issue. This is only one of the potential issues.
    
    Looking up for a matching socket in udp_gro_receive is fragile: the
    lookup could be extended to all netns (not speaking about performances)
    but nothing prevents those packets from being modified in between and we
    could still not find a matching socket. It's OK to keep the current
    logic there as it should cover most cases but we also need to make sure
    we handle tunnel packets being GROed too early.
    
    This is done by extending the checks in udp_unexpected_gso: GSO packets
    lacking the SKB_GSO_UDP_TUNNEL/_CSUM bits and landing in a tunnel must
    be segmented.
    
    [1] kernel BUG at net/core/skbuff.c:4408!
        RIP: 0010:skb_segment+0xd2a/0xf70
        __udp_gso_segment+0xaa/0x560
    
    Fixes: 9fd1ff5d2ac7 ("udp: Support UDP fraglist GRO/GSO.")
    Fixes: 36707061d6ba ("udp: allow forwarding of plain (non-fraglisted) UDP GRO packets")
    Signed-off-by: Antoine Tenart <atenart@kernel.org>
    Reviewed-by: Willem de Bruijn <willemb@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

udp: do not transition UDP GRO fraglist partial checksums to unnecessary [+ + +]

Author: Antoine Tenart <atenart@kernel.org>
Date:   Tue Mar 26 12:34:00 2024 +0100

    udp: do not transition UDP GRO fraglist partial checksums to unnecessary
    
    commit f0b8c30345565344df2e33a8417a27503589247d upstream.
    
    UDP GRO validates checksums and in udp4/6_gro_complete fraglist packets
    are converted to CHECKSUM_UNNECESSARY to avoid later checks. However
    this is an issue for CHECKSUM_PARTIAL packets as they can be looped in
    an egress path and then their partial checksums are not fixed.
    
    Different issues can be observed, from invalid checksum on packets to
    traces like:
    
      gen01: hw csum failure
      skb len=3008 headroom=160 headlen=1376 tailroom=0
      mac=(106,14) net=(120,40) trans=160
      shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
      csum(0xffff232e ip_summed=2 complete_sw=0 valid=0 level=0)
      hash(0x77e3d716 sw=1 l4=1) proto=0x86dd pkttype=0 iif=12
      ...
    
    Fix this by only converting CHECKSUM_NONE packets to
    CHECKSUM_UNNECESSARY by reusing __skb_incr_checksum_unnecessary. All
    other checksum types are kept as-is, including CHECKSUM_COMPLETE as
    fraglist packets being segmented back would have their skb->csum valid.
    
    Fixes: 9fd1ff5d2ac7 ("udp: Support UDP fraglist GRO/GSO.")
    Signed-off-by: Antoine Tenart <atenart@kernel.org>
    Reviewed-by: Willem de Bruijn <willemb@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

udp: prevent local UDP tunnel packets from being GROed [+ + +]

Author: Antoine Tenart <atenart@kernel.org>
Date:   Tue Mar 26 12:34:01 2024 +0100

    udp: prevent local UDP tunnel packets from being GROed
    
    commit 64235eabc4b5b18c507c08a1f16cdac6c5661220 upstream.
    
    GRO has a fundamental issue with UDP tunnel packets as it can't detect
    those in a foolproof way and GRO could happen before they reach the
    tunnel endpoint. Previous commits have fixed issues when UDP tunnel
    packets come from a remote host, but if those packets are issued locally
    they could run into checksum issues.
    
    If the inner packet has a partial checksum the information will be lost
    in the GRO logic, either in udp4/6_gro_complete or in
    udp_gro_complete_segment and packets will have an invalid checksum when
    leaving the host.
    
    Prevent local UDP tunnel packets from ever being GROed at the outer UDP
    level.
    
    Due to skb->encapsulation being wrongly used in some drivers this is
    actually only preventing UDP tunnel packets with a partial checksum to
    be GROed (see iptunnel_handle_offloads) but those were also the packets
    triggering issues so in practice this should be sufficient.
    
    Fixes: 9fd1ff5d2ac7 ("udp: Support UDP fraglist GRO/GSO.")
    Fixes: 36707061d6ba ("udp: allow forwarding of plain (non-fraglisted) UDP GRO packets")
    Suggested-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Antoine Tenart <atenart@kernel.org>
    Reviewed-by: Willem de Bruijn <willemb@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

vboxsf: Avoid an spurious warning if load_nls_xxx() fails [+ + +]

Author: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Date:   Wed Nov 1 11:49:48 2023 +0100

    vboxsf: Avoid an spurious warning if load_nls_xxx() fails
    
    commit de3f64b738af57e2732b91a0774facc675b75b54 upstream.
    
    If an load_nls_xxx() function fails a few lines above, the 'sbi->bdi_id' is
    still 0.
    So, in the error handling path, we will call ida_simple_remove(..., 0)
    which is not allocated yet.
    
    In order to prevent a spurious "ida_free called for id=0 which is not
    allocated." message, tweak the error handling path and add a new label.
    
    Fixes: 0fd169576648 ("fs: Add VirtualBox guest shared folder (vboxsf) support")
    Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
    Link: https://lore.kernel.org/r/d09eaaa4e2e08206c58a1a27ca9b3e81dc168773.1698835730.git.christophe.jaillet@wanadoo.fr
    Reviewed-by: Hans de Goede <hdegoede@redhat.com>
    Signed-off-by: Hans de Goede <hdegoede@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

vsock/virtio: fix packet delivery to tap device [+ + +]

Author: Marco Pinna <marco.pinn95@gmail.com>
Date:   Fri Mar 29 17:12:59 2024 +0100

    vsock/virtio: fix packet delivery to tap device
    
    commit b32a09ea7c38849ff925489a6bf5bd8914bc45df upstream.
    
    Commit 82dfb540aeb2 ("VSOCK: Add virtio vsock vsockmon hooks") added
    virtio_transport_deliver_tap_pkt() for handing packets to the
    vsockmon device. However, in virtio_transport_send_pkt_work(),
    the function is called before actually sending the packet (i.e.
    before placing it in the virtqueue with virtqueue_add_sgs() and checking
    whether it returned successfully).
    Queuing the packet in the virtqueue can fail even multiple times.
    However, in virtio_transport_deliver_tap_pkt() we deliver the packet
    to the monitoring tap interface only the first time we call it.
    This certainly avoids seeing the same packet replicated multiple times
    in the monitoring interface, but it can show the packet sent with the
    wrong timestamp or even before we succeed to queue it in the virtqueue.
    
    Move virtio_transport_deliver_tap_pkt() after calling virtqueue_add_sgs()
    and making sure it returned successfully.
    
    Fixes: 82dfb540aeb2 ("VSOCK: Add virtio vsock vsockmon hooks")
    Cc: stable@vge.kernel.org
    Signed-off-by: Marco Pinna <marco.pinn95@gmail.com>
    Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
    Link: https://lore.kernel.org/r/20240329161259.411751-1-marco.pinn95@gmail.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

wifi: iwlwifi: mvm: rfi: fix potential response leaks [+ + +]

Author: Johannes Berg <johannes.berg@intel.com>
Date:   Tue Mar 19 10:10:17 2024 +0200

    wifi: iwlwifi: mvm: rfi: fix potential response leaks
    
    [ Upstream commit 06a093807eb7b5c5b29b6cff49f8174a4e702341 ]
    
    If the rx payload length check fails, or if kmemdup() fails,
    we still need to free the command response. Fix that.
    
    Fixes: 21254908cbe9 ("iwlwifi: mvm: add RFI-M support")
    Co-authored-by: Anjaneyulu <pagadala.yesu.anjaneyulu@intel.com>
    Signed-off-by: Johannes Berg <johannes.berg@intel.com>
    Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
    Link: https://msgid.link/20240319100755.db2fa0196aa7.I116293b132502ac68a65527330fa37799694b79c@changeid
    Signed-off-by: Johannes Berg <johannes.berg@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

x86/bhi: Add BHI mitigation knob [+ + +]

Author: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Date:   Mon Mar 11 08:57:05 2024 -0700

    x86/bhi: Add BHI mitigation knob
    
    commit ec9404e40e8f36421a2b66ecb76dc2209fe7f3ef upstream.
    
    Branch history clearing software sequences and hardware control
    BHI_DIS_S were defined to mitigate Branch History Injection (BHI).
    
    Add cmdline spectre_bhi={on|off|auto} to control BHI mitigation:
    
     auto - Deploy the hardware mitigation BHI_DIS_S, if available.
     on   - Deploy the hardware mitigation BHI_DIS_S, if available,
            otherwise deploy the software sequence at syscall entry and
            VMexit.
     off  - Turn off BHI mitigation.
    
    The default is auto mode which does not deploy the software sequence
    mitigation.  This is because of the hardening done in the syscall
    dispatch path, which is the likely target of BHI.
    
    Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
    Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
    Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
    Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/bhi: Add support for clearing branch history at syscall entry [+ + +]

Author: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Date:   Mon Mar 11 08:56:58 2024 -0700

    x86/bhi: Add support for clearing branch history at syscall entry
    
    commit 7390db8aea0d64e9deb28b8e1ce716f5020c7ee5 upstream.
    
    Branch History Injection (BHI) attacks may allow a malicious application to
    influence indirect branch prediction in kernel by poisoning the branch
    history. eIBRS isolates indirect branch targets in ring0.  The BHB can
    still influence the choice of indirect branch predictor entry, and although
    branch predictor entries are isolated between modes when eIBRS is enabled,
    the BHB itself is not isolated between modes.
    
    Alder Lake and new processors supports a hardware control BHI_DIS_S to
    mitigate BHI.  For older processors Intel has released a software sequence
    to clear the branch history on parts that don't support BHI_DIS_S. Add
    support to execute the software sequence at syscall entry and VMexit to
    overwrite the branch history.
    
    For now, branch history is not cleared at interrupt entry, as malicious
    applications are not believed to have sufficient control over the
    registers, since previous register state is cleared at interrupt
    entry. Researchers continue to poke at this area and it may become
    necessary to clear at interrupt entry as well in the future.
    
    This mitigation is only defined here. It is enabled later.
    
    Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
    Co-developed-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
    Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
    Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
    Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/bhi: Define SPEC_CTRL_BHI_DIS_S [+ + +]

Author: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Date:   Wed Mar 13 09:47:57 2024 -0700

    x86/bhi: Define SPEC_CTRL_BHI_DIS_S
    
    commit 0f4a837615ff925ba62648d280a861adf1582df7 upstream.
    
    Newer processors supports a hardware control BHI_DIS_S to mitigate
    Branch History Injection (BHI). Setting BHI_DIS_S protects the kernel
    from userspace BHI attacks without having to manually overwrite the
    branch history.
    
    Define MSR_SPEC_CTRL bit BHI_DIS_S and its enumeration CPUID.BHI_CTRL.
    Mitigation is enabled later.
    
    Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
    Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
    Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
    Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
    Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/bhi: Enumerate Branch History Injection (BHI) bug [+ + +]

Author: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Date:   Mon Mar 11 08:57:03 2024 -0700

    x86/bhi: Enumerate Branch History Injection (BHI) bug
    
    commit be482ff9500999f56093738f9219bbabc729d163 upstream.
    
    Mitigation for BHI is selected based on the bug enumeration. Add bits
    needed to enumerate BHI bug.
    
    Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
    Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
    Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
    Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/bhi: Mitigate KVM by default [+ + +]

Author: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Date:   Mon Mar 11 08:57:09 2024 -0700

    x86/bhi: Mitigate KVM by default
    
    commit 95a6ccbdc7199a14b71ad8901cb788ba7fb5167b upstream.
    
    BHI mitigation mode spectre_bhi=auto does not deploy the software
    mitigation by default. In a cloud environment, it is a likely scenario
    where userspace is trusted but the guests are not trusted. Deploying
    system wide mitigation in such cases is not desirable.
    
    Update the auto mode to unconditionally mitigate against malicious
    guests. Deploy the software sequence at VMexit in auto mode also, when
    hardware mitigation is not available. Unlike the force =on mode,
    software sequence is not deployed at syscalls in auto mode.
    
    Suggested-by: Alexandre Chartre <alexandre.chartre@oracle.com>
    Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
    Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
    Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
    Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/bugs: Change commas to semicolons in 'spectre_v2' sysfs file [+ + +]

Author: Josh Poimboeuf <jpoimboe@kernel.org>
Date:   Fri Apr 5 11:14:13 2024 -0700

    x86/bugs: Change commas to semicolons in 'spectre_v2' sysfs file
    
    commit 0cd01ac5dcb1e18eb18df0f0d05b5de76522a437 upstream.
    
    Change the format of the 'spectre_v2' vulnerabilities sysfs file
    slightly by converting the commas to semicolons, so that mitigations for
    future variants can be grouped together and separated by commas.
    
    Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
    Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/bugs: Fix the SRSO mitigation on Zen3/4 [+ + +]

Author: Borislav Petkov (AMD) <bp@alien8.de>
Date:   Thu Mar 28 13:59:05 2024 +0100

    x86/bugs: Fix the SRSO mitigation on Zen3/4
    
    commit 4535e1a4174c4111d92c5a9a21e542d232e0fcaa upstream.
    
    The original version of the mitigation would patch in the calls to the
    untraining routines directly.  That is, the alternative() in UNTRAIN_RET
    will patch in the CALL to srso_alias_untrain_ret() directly.
    
    However, even if commit e7c25c441e9e ("x86/cpu: Cleanup the untrain
    mess") meant well in trying to clean up the situation, due to micro-
    architectural reasons, the untraining routine srso_alias_untrain_ret()
    must be the target of a CALL instruction and not of a JMP instruction as
    it is done now.
    
    Reshuffle the alternative macros to accomplish that.
    
    Fixes: e7c25c441e9e ("x86/cpu: Cleanup the untrain mess")
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Reviewed-by: Ingo Molnar <mingo@kernel.org>
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/coco: Require seeding RNG with RDRAND on CoCo systems [+ + +]

Author: Jason A. Donenfeld <Jason@zx2c4.com>
Date:   Tue Mar 26 17:07:35 2024 +0100

    x86/coco: Require seeding RNG with RDRAND on CoCo systems
    
    commit 99485c4c026f024e7cb82da84c7951dbe3deb584 upstream.
    
    There are few uses of CoCo that don't rely on working cryptography and
    hence a working RNG. Unfortunately, the CoCo threat model means that the
    VM host cannot be trusted and may actively work against guests to
    extract secrets or manipulate computation. Since a malicious host can
    modify or observe nearly all inputs to guests, the only remaining source
    of entropy for CoCo guests is RDRAND.
    
    If RDRAND is broken -- due to CPU hardware fault -- the RNG as a whole
    is meant to gracefully continue on gathering entropy from other sources,
    but since there aren't other sources on CoCo, this is catastrophic.
    This is mostly a concern at boot time when initially seeding the RNG, as
    after that the consequences of a broken RDRAND are much more
    theoretical.
    
    So, try at boot to seed the RNG using 256 bits of RDRAND output. If this
    fails, panic(). This will also trigger if the system is booted without
    RDRAND, as RDRAND is essential for a safe CoCo boot.
    
    Add this deliberately to be "just a CoCo x86 driver feature" and not
    part of the RNG itself. Many device drivers and platforms have some
    desire to contribute something to the RNG, and add_device_randomness()
    is specifically meant for this purpose.
    
    Any driver can call it with seed data of any quality, or even garbage
    quality, and it can only possibly make the quality of the RNG better or
    have no effect, but can never make it worse.
    
    Rather than trying to build something into the core of the RNG, consider
    the particular CoCo issue just a CoCo issue, and therefore separate it
    all out into driver (well, arch/platform) code.
    
      [ bp: Massage commit message. ]
    
    Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Reviewed-by: Elena Reshetova <elena.reshetova@intel.com>
    Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Reviewed-by: Theodore Ts'o <tytso@mit.edu>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20240326160735.73531-1-Jason@zx2c4.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/cpufeatures: Add CPUID_LNX_5 to track recently added Linux-defined word [+ + +]

Author: Sean Christopherson <seanjc@google.com>
Date:   Thu Apr 4 17:16:14 2024 -0700

    x86/cpufeatures: Add CPUID_LNX_5 to track recently added Linux-defined word
    
    commit 8cb4a9a82b21623dbb4b3051dd30d98356cf95bc upstream.
    
    Add CPUID_LNX_5 to track cpufeatures' word 21, and add the appropriate
    compile-time assert in KVM to prevent direct lookups on the features in
    CPUID_LNX_5.  KVM uses X86_FEATURE_* flags to manage guest CPUID, and so
    must translate features that are scattered by Linux from the Linux-defined
    bit to the hardware-defined bit, i.e. should never try to directly access
    scattered features in guest CPUID.
    
    Opportunistically add NR_CPUID_WORDS to enum cpuid_leafs, along with a
    compile-time assert in KVM's CPUID infrastructure to ensure that future
    additions update cpuid_leafs along with NCAPINTS.
    
    No functional change intended.
    
    Fixes: 7f274e609f3d ("x86/cpufeatures: Add new word for scattered features")
    Cc: Sandipan Das <sandipan.das@amd.com>
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/cpufeatures: Add new word for scattered features [+ + +]

Author: Sandipan Das <sandipan.das@amd.com>
Date:   Mon Mar 25 13:01:44 2024 +0530

    x86/cpufeatures: Add new word for scattered features
    
    [ Upstream commit 7f274e609f3d5f45c22b1dd59053f6764458b492 ]
    
    Add a new word for scattered features because all free bits among the
    existing Linux-defined auxiliary flags have been exhausted.
    
    Signed-off-by: Sandipan Das <sandipan.das@amd.com>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Link: https://lore.kernel.org/r/8380d2a0da469a1f0ad75b8954a79fb689599ff6.1711091584.git.sandipan.das@amd.com
    Stable-dep-of: 598c2fafc06f ("perf/x86/amd/lbr: Use freeze based on availability")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

x86/mce: Make sure to grab mce_sysfs_mutex in set_bank() [+ + +]

Author: Borislav Petkov (AMD) <bp@alien8.de>
Date:   Wed Mar 13 14:48:27 2024 +0100

    x86/mce: Make sure to grab mce_sysfs_mutex in set_bank()
    
    commit 3ddf944b32f88741c303f0b21459dbb3872b8bc5 upstream.
    
    Modifying a MCA bank's MCA_CTL bits which control which error types to
    be reported is done over
    
      /sys/devices/system/machinecheck/
      Б■°Б■─Б■─ machinecheck0
      Б■┌б═б═ Б■°Б■─Б■─ bank0
      Б■┌б═б═ Б■°Б■─Б■─ bank1
      Б■┌б═б═ Б■°Б■─Б■─ bank10
      Б■┌б═б═ Б■°Б■─Б■─ bank11
      ...
    
    sysfs nodes by writing the new bit mask of events to enable.
    
    When the write is accepted, the kernel deletes all current timers and
    reinits all banks.
    
    Doing that in parallel can lead to initializing a timer which is already
    armed and in the timer wheel, i.e., in use already:
    
      ODEBUG: init active (active state 0) object: ffff888063a28000 object
      type: timer_list hint: mce_timer_fn+0x0/0x240 arch/x86/kernel/cpu/mce/core.c:2642
      WARNING: CPU: 0 PID: 8120 at lib/debugobjects.c:514
      debug_print_object+0x1a0/0x2a0 lib/debugobjects.c:514
    
    Fix that by grabbing the sysfs mutex as the rest of the MCA sysfs code
    does.
    
    Reported by: Yue Sun <samsun1006219@gmail.com>
    Reported by: xingwei lee <xrivendell7@gmail.com>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Cc: <stable@kernel.org>
    Link: https://lore.kernel.org/r/CAEkJfYNiENwQY8yV1LYJ9LjJs%2Bx_-PqMv98gKig55=2vbzffRw@mail.gmail.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/mm/pat: fix VM_PAT handling in COW mappings [+ + +]

Author: David Hildenbrand <david@redhat.com>
Date:   Wed Apr 3 23:21:30 2024 +0200

    x86/mm/pat: fix VM_PAT handling in COW mappings
    
    commit 04c35ab3bdae7fefbd7c7a7355f29fa03a035221 upstream.
    
    PAT handling won't do the right thing in COW mappings: the first PTE (or,
    in fact, all PTEs) can be replaced during write faults to point at anon
    folios.  Reliably recovering the correct PFN and cachemode using
    follow_phys() from PTEs will not work in COW mappings.
    
    Using follow_phys(), we might just get the address+protection of the anon
    folio (which is very wrong), or fail on swap/nonswap entries, failing
    follow_phys() and triggering a WARN_ON_ONCE() in untrack_pfn() and
    track_pfn_copy(), not properly calling free_pfn_range().
    
    In free_pfn_range(), we either wouldn't call memtype_free() or would call
    it with the wrong range, possibly leaking memory.
    
    To fix that, let's update follow_phys() to refuse returning anon folios,
    and fallback to using the stored PFN inside vma->vm_pgoff for COW mappings
    if we run into that.
    
    We will now properly handle untrack_pfn() with COW mappings, where we
    don't need the cachemode.  We'll have to fail fork()->track_pfn_copy() if
    the first page was replaced by an anon folio, though: we'd have to store
    the cachemode in the VMA to make this work, likely growing the VMA size.
    
    For now, lets keep it simple and let track_pfn_copy() just fail in that
    case: it would have failed in the past with swap/nonswap entries already,
    and it would have done the wrong thing with anon folios.
    
    Simple reproducer to trigger the WARN_ON_ONCE() in untrack_pfn():
    
    <--- C reproducer --->
     #include <stdio.h>
     #include <sys/mman.h>
     #include <unistd.h>
     #include <liburing.h>
    
     int main(void)
     {
             struct io_uring_params p = {};
             int ring_fd;
             size_t size;
             char *map;
    
             ring_fd = io_uring_setup(1, &p);
             if (ring_fd < 0) {
                     perror("io_uring_setup");
                     return 1;
             }
             size = p.sq_off.array + p.sq_entries * sizeof(unsigned);
    
             /* Map the submission queue ring MAP_PRIVATE */
             map = mmap(0, size, PROT_READ | PROT_WRITE, MAP_PRIVATE,
                        ring_fd, IORING_OFF_SQ_RING);
             if (map == MAP_FAILED) {
                     perror("mmap");
                     return 1;
             }
    
             /* We have at least one page. Let's COW it. */
             *map = 0;
             pause();
             return 0;
     }
    <--- C reproducer --->
    
    On a system with 16 GiB RAM and swap configured:
     # ./iouring &
     # memhog 16G
     # killall iouring
    [  301.552930] ------------[ cut here ]------------
    [  301.553285] WARNING: CPU: 7 PID: 1402 at arch/x86/mm/pat/memtype.c:1060 untrack_pfn+0xf4/0x100
    [  301.553989] Modules linked in: binfmt_misc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_g
    [  301.558232] CPU: 7 PID: 1402 Comm: iouring Not tainted 6.7.5-100.fc38.x86_64 #1
    [  301.558772] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebu4
    [  301.559569] RIP: 0010:untrack_pfn+0xf4/0x100
    [  301.559893] Code: 75 c4 eb cf 48 8b 43 10 8b a8 e8 00 00 00 3b 6b 28 74 b8 48 8b 7b 30 e8 ea 1a f7 000
    [  301.561189] RSP: 0018:ffffba2c0377fab8 EFLAGS: 00010282
    [  301.561590] RAX: 00000000ffffffea RBX: ffff9208c8ce9cc0 RCX: 000000010455e047
    [  301.562105] RDX: 07fffffff0eb1e0a RSI: 0000000000000000 RDI: ffff9208c391d200
    [  301.562628] RBP: 0000000000000000 R08: ffffba2c0377fab8 R09: 0000000000000000
    [  301.563145] R10: ffff9208d2292d50 R11: 0000000000000002 R12: 00007fea890e0000
    [  301.563669] R13: 0000000000000000 R14: ffffba2c0377fc08 R15: 0000000000000000
    [  301.564186] FS:  0000000000000000(0000) GS:ffff920c2fbc0000(0000) knlGS:0000000000000000
    [  301.564773] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [  301.565197] CR2: 00007fea88ee8a20 CR3: 00000001033a8000 CR4: 0000000000750ef0
    [  301.565725] PKRU: 55555554
    [  301.565944] Call Trace:
    [  301.566148]  <TASK>
    [  301.566325]  ? untrack_pfn+0xf4/0x100
    [  301.566618]  ? __warn+0x81/0x130
    [  301.566876]  ? untrack_pfn+0xf4/0x100
    [  301.567163]  ? report_bug+0x171/0x1a0
    [  301.567466]  ? handle_bug+0x3c/0x80
    [  301.567743]  ? exc_invalid_op+0x17/0x70
    [  301.568038]  ? asm_exc_invalid_op+0x1a/0x20
    [  301.568363]  ? untrack_pfn+0xf4/0x100
    [  301.568660]  ? untrack_pfn+0x65/0x100
    [  301.568947]  unmap_single_vma+0xa6/0xe0
    [  301.569247]  unmap_vmas+0xb5/0x190
    [  301.569532]  exit_mmap+0xec/0x340
    [  301.569801]  __mmput+0x3e/0x130
    [  301.570051]  do_exit+0x305/0xaf0
    ...
    
    Link: https://lkml.kernel.org/r/20240403212131.929421-3-david@redhat.com
    Signed-off-by: David Hildenbrand <david@redhat.com>
    Reported-by: Wupeng Ma <mawupeng1@huawei.com>
    Closes: https://lkml.kernel.org/r/20240227122814.3781907-1-mawupeng1@huawei.com
    Fixes: b1a86e15dc03 ("x86, pat: remove the dependency on 'vm_pgoff' in track/untrack pfn vma routines")
    Fixes: 5899329b1910 ("x86: PAT: implement track/untrack of pfnmap regions for x86 - v3")
    Acked-by: Ingo Molnar <mingo@kernel.org>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/retpoline: Add NOENDBR annotation to the SRSO dummy return thunk [+ + +]

Author: Borislav Petkov (AMD) <bp@alien8.de>
Date:   Fri Apr 5 16:46:37 2024 +0200

    x86/retpoline: Add NOENDBR annotation to the SRSO dummy return thunk
    
    commit b377c66ae3509ccea596512d6afb4777711c4870 upstream.
    
    srso_alias_untrain_ret() is special code, even if it is a dummy
    which is called in the !SRSO case, so annotate it like its real
    counterpart, to address the following objtool splat:
    
      vmlinux.o: warning: objtool: .export_symbol+0x2b290: data relocation to !ENDBR: srso_alias_untrain_ret+0x0
    
    Fixes: 4535e1a4174c ("x86/bugs: Fix the SRSO mitigation on Zen3/4")
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Link: https://lore.kernel.org/r/20240405144637.17908-1-bp@kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/retpoline: Do the necessary fixup to the Zen3/4 srso return thunk for !SRSO [+ + +]

Author: Borislav Petkov (AMD) <bp@alien8.de>
Date:   Tue Apr 2 16:05:49 2024 +0200

    x86/retpoline: Do the necessary fixup to the Zen3/4 srso return thunk for !SRSO
    
    commit 0e110732473e14d6520e49d75d2c88ef7d46fe67 upstream.
    
    The srso_alias_untrain_ret() dummy thunk in the !CONFIG_MITIGATION_SRSO
    case is there only for the altenative in CALL_UNTRAIN_RET to have
    a symbol to resolve.
    
    However, testing with kernels which don't have CONFIG_MITIGATION_SRSO
    enabled, leads to the warning in patch_return() to fire:
    
      missing return thunk: srso_alias_untrain_ret+0x0/0x10-0x0: eb 0e 66 66 2e
      WARNING: CPU: 0 PID: 0 at arch/x86/kernel/alternative.c:826 apply_returns (arch/x86/kernel/alternative.c:826
    
    Put in a plain "ret" there so that gcc doesn't put a return thunk in
    in its place which special and gets checked.
    
    In addition:
    
      ERROR: modpost: "srso_alias_untrain_ret" [arch/x86/kvm/kvm-amd.ko] undefined!
      make[2]: *** [scripts/Makefile.modpost:145: Module.symvers] Chyba 1
      make[1]: *** [/usr/src/linux-6.8.3/Makefile:1873: modpost] Chyba 2
      make: *** [Makefile:240: __sub-make] Chyba 2
    
    since !SRSO builds would use the dummy return thunk as reported by
    petr.pisar@atlas.cz, https://bugzilla.kernel.org/show_bug.cgi?id=218679.
    
    Reported-by: kernel test robot <oliver.sang@intel.com>
    Closes: https://lore.kernel.org/oe-lkp/202404020901.da75a60f-oliver.sang@intel.com
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Link: https://lore.kernel.org/all/202404020901.da75a60f-oliver.sang@intel.com/
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/syscall: Don't force use of indirect calls for system calls [+ + +]

Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Wed Apr 3 16:36:44 2024 -0700

    x86/syscall: Don't force use of indirect calls for system calls
    
    commit 1e3ad78334a69b36e107232e337f9d693dcc9df2 upstream.
    
    Make <asm/syscall.h> build a switch statement instead, and the compiler can
    either decide to generate an indirect jump, or - more likely these days due
    to mitigations - just a series of conditional branches.
    
    Yes, the conditional branches also have branch prediction, but the branch
    prediction is much more controlled, in that it just causes speculatively
    running the wrong system call (harmless), rather than speculatively running
    possibly wrong random less controlled code gadgets.
    
    This doesn't mitigate other indirect calls, but the system call indirection
    is the first and most easily triggered case.
    
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
    Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86: set SPECTRE_BHI_ON as default [+ + +]

Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date:   Tue Apr 9 19:32:41 2024 +0200

    x86: set SPECTRE_BHI_ON as default
    
    commit 2bb69f5fc72183e1c62547d900f560d0e9334925 upstream.
    
    Part of a merge commit from Linus that adjusted the default setting of
    SPECTRE_BHI_ON.
    
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xen-netfront: Add missing skb_mark_for_recycle [+ + +]

Author: Jesper Dangaard Brouer <hawk@kernel.org>
Date:   Wed Mar 27 13:14:56 2024 +0100

    xen-netfront: Add missing skb_mark_for_recycle
    
    commit 037965402a010898d34f4e35327d22c0a95cd51f upstream.
    
    Notice that skb_mark_for_recycle() is introduced later than fixes tag in
    commit 6a5bcd84e886 ("page_pool: Allow drivers to hint on SKB recycling").
    
    It is believed that fixes tag were missing a call to page_pool_release_page()
    between v5.9 to v5.14, after which is should have used skb_mark_for_recycle().
    Since v6.6 the call page_pool_release_page() were removed (in
    commit 535b9c61bdef ("net: page_pool: hide page_pool_release_page()")
    and remaining callers converted (in commit 6bfef2ec0172 ("Merge branch
    'net-page_pool-remove-page_pool_release_page'")).
    
    This leak became visible in v6.8 via commit dba1b8a7ab68 ("mm/page_pool: catch
    page_pool memory leaks").
    
    Cc: stable@vger.kernel.org
    Fixes: 6c5aa6fc4def ("xen networking: add basic XDP support for xen-netfront")
    Reported-by: Leonidas Spyropoulos <artafinde@archlinux.com>
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=218654
    Reported-by: Arthur Borsboom <arthurborsboom@gmail.com>
    Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
    Link: https://lore.kernel.org/r/171154167446.2671062.9127105384591237363.stgit@firesoul
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Список изменений в Linux 6.1.85