forked from torvalds/linux
-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fixup! drm/asahi: Add the Asahi driver for Apple AGX GPUs #135
Open
jannau
wants to merge
562
commits into
AsahiLinux:gpu/rust-wip
Choose a base branch
from
jannau:gpu-gem-bind-log-fix
base: gpu/rust-wip
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Upgrade the Rust version from 1.62.0 to 1.66.0. The overwhelming majority of the commit is about upgrading our `alloc` fork to the new version from upstream [1]. License has not changed [2][3] (there were changes in the `COPYRIGHT` file, but unrelated to `alloc`). As in the previous version upgrades (done out of tree so far), upgrading `alloc` requires checking that our small additions (`try_*`) still match their original (non`-try_*`) versions. With this version upgrade, the following unstable Rust features were stabilized: `bench_black_box` (1.66.0), `const_ptr_offset_from` (1.65.0), `core_ffi_c` (1.64.0) and `generic_associated_types` (1.65.0). Thus remove them. This also implies that only two unstable features remain allowed for non-`rust/` code: `allocator_api` and `const_refs_to_cell`. There are some new Clippy warnings that we are triggering (i.e. introduced since 1.62.0), as well as a few others that were not triggered before, thus allow them in this commit and clean up or remove them as needed later on: - `borrow_deref_ref` (new in 1.63.0). - `explicit_auto_deref` (new in 1.64.0). - `bool_to_int_with_if` (new in 1.65.0). - `needless_borrow`. - `type_complexity`. - `unnecessary_cast` (allowed only on `CONFIG_ARM`). Furthermore, `rustdoc` lint `broken_intra_doc_links` is triggering on links pointing to `macro_export` `macro_rules` defined in the same module (i.e. appearing in the crate root). However, even if the warning appears, the link still gets generated like in previous versions, thus it is a bit confusing. An issue has been opened upstream [4], and it appears that the link still being generated was a compatibility measure, thus we will need to adapt to the new behavior (done in the next patch). In addition, there is an added `#[const_trait]` attribute in `RawDeviceId`, due to 1.66.0's PR "Require `#[const_trait]` on `Trait` for `impl const Trait`") [5]. Finally, the `-Aunused-imports` was added for compiling `core`. This was fixed upstream for 1.67.0 in PR "Fix warning when libcore is compiled with no_fp_fmt_parse" [6], and prevented for the future for that `cfg` in PR "core: ensure `no_fp_fmt_parse builds` are warning-free" [7]. Reviewed-by: Björn Roy Baron <[email protected]> Reviewed-by: Martin Rodriguez Reboredo <[email protected]> Tested-by: Martin Rodriguez Reboredo <[email protected]> Reviewed-by: Gary Guo <[email protected]> Reviewed-by: Vincenzo Palazzo <[email protected]> Reviewed-by: Alice Ferrazzi <[email protected]> Tested-by: Alice Ferrazzi <[email protected]> Reviewed-by: Neal Gompa <[email protected]> Tested-by: Neal Gompa <[email protected]> Link: Rust-for-Linux#947 Link: https://github.com/rust-lang/rust/tree/1.66.0/library/alloc/src [1] Link: https://github.com/rust-lang/rust/blob/1.66.0/library/alloc/Cargo.toml#L4 [2] Link: https://github.com/rust-lang/rust/blob/1.66.0/COPYRIGHT [3] Link: rust-lang/rust#106142 [4] Link: rust-lang/rust#100982 [5] Link: rust-lang/rust#105434 [6] Link: rust-lang/rust#105811 [7] Signed-off-by: Miguel Ojeda <[email protected]>
Commit reference: 3dfc5eb Signed-off-by: Asahi Lina <[email protected]>
Commit reference: 3dfc5eb
Co-developed-by: Wedson Almeida Filho <[email protected]> Signed-off-by: Wedson Almeida Filho <[email protected]> Signed-off-by: Asahi Lina <[email protected]>
Commit reference: 3dfc5eb
Commit reference: 3dfc5eb Signed-off-by: Asahi Lina <[email protected]>
Commit reference: 3dfc5eb
Commit reference: 3dfc5eb
Commit reference: 3dfc5eb
Commit reference: 3dfc5eb
This abstraction enables Rust drivers to walk Device Tree nodes and query their properties. Signed-off-by: Asahi Lina <[email protected]>
…ation In order for modpost to work and correctly generate module aliases from device ID tables, it needs those tables to exist as global symbols with a specific name. Additionally, modpost checks the size of the symbol, so it cannot contain trailing data. To support this, split IdArrayIds out of IdArray. The former contains just the IDs. Then split out the device table definition macro from the macro that defines the device table for a given bus driver, and add another macro to declare a device table as a module device table. Drivers can now define their ID table once, and then specify that it should be used for both the driver and the module: // Generic OF Device ID table. kernel::define_of_id_table! {ASAHI_ID_TABLE, &'static hw::HwConfig, [ (of::DeviceId::Compatible(b"apple,agx-t8103"), Some(&hw::t8103::HWCONFIG)), (of::DeviceId::Compatible(b"apple,agx-t8112"), Some(&hw::t8112::HWCONFIG)), // ... ]} /// Platform Driver implementation for `AsahiDriver`. impl platform::Driver for AsahiDriver { /// Data associated with each hardware ID. type IdInfo = &'static hw::HwConfig; // Assign the above OF ID table to this driver. kernel::driver_of_id_table!(ASAHI_ID_TABLE); // ... } // Export the OF ID table as a module ID table, to make modpost/autoloading work. kernel::module_of_id_table!(MOD_TABLE, ASAHI_ID_TABLE); Signed-off-by: Asahi Lina <[email protected]>
This patch adds a logic similar to `devm_platform_ioremap_resource` function adding: - `IoResource` enumerated type that groups the `IORESOURCE_*` macros. - `get_resource()` method that is a binding of `platform_get_resource` - `ioremap_resource` that is newly written method similar to `devm_platform_ioremap_resource`. Lina: Removed `bit` dependency and rebased Co-developed-by: Asahi Lina <[email protected]> Signed-off-by: Maciej Falkowski <[email protected]>
Allows drivers to configure the DMA masks for a device. Implemented here, not in device, because it requires a mutable platform device reference this way (device::Device is a safely clonable reference). Signed-off-by: Asahi Lina <[email protected]>
Apple SoCs require non-posted mappings for MMIO, and this is automatically handled by devm_ioremap_resource() and friends via a resource flag. Implement the same logic in kernel::io_mem, so it can work the same way. Signed-off-by: Asahi Lina <[email protected]>
Commit reference: 3dfc5eb
Commit reference: 3dfc5eb
TODO: This isn't abstracted properly yet Signed-off-by: Asahi Lina <[email protected]>
Signed-off-by: Asahi Lina <[email protected]>
Signed-off-by: Asahi Lina <[email protected]>
Commit reference: 3dfc5eb
Commit reference: 3dfc5eb
Commit reference: 3dfc5eb
Apple Silicon SoCs (M1, M2, etc.) have a GPU with an ARM64 firmware coprocessor. The firmware and the GPU share page tables in the standard ARM64 format (the firmware literally sets the base as its TTBR0/1 registers). TTBR0 covers the low half of the address space and is intended to be per-GPU-VM (GPU user mappings and kernel-managed buffers), while TTBR1 covers the upper half and is global (firmware code, data, management structures shared with the AP, and a few GPU-accessible data structures). In typical Apple fashion, the permissions are interpreted differently from traditional ARM PTEs. By default, firmware mappings use Apple SPRR permission remapping. The firmware only uses that for its own code/data/MMIO mappings, and those pages are not accessible by the GPU hardware. We never need to touch/manage these mappings, so this patch does not support them. When a specific bit is set in the PTEs, permissions switch to a different scheme which supports various combinations of firmware/GPU access. This is the mode intended to be used by AP GPU drivers, and what we implement here. The prot bits are interpreted as follows: - IOMMU_READ and IOMMU_WRITE have the usual meaning. - IOMMU_PRIV creates firmware-only mappings (no GPU access) - IOMMU_NOEXEC creates GPU-only structures (no FW access) - Otherwise structures are accessible by both GPU and FW - IOMMU_MMIO creates Device mappings for firmware - IOMMU_CACHE creates Normal-NC mappings for firmware (cache-coherent from the point of view of the AP, but slower) - Otherwise creates Normal mappings for firmware (this requires manual cache management on the firmware side, as it is not coherent with the SoC fabric) GPU-only mappings (textures/etc) are expected to use IOMMU_CACHE and are seemingly coherent with the CPU (or otherwise the firmware/GPU already issue the required cache management operations when correctly configured). There is a GPU-RO/FW-RW mode, but it is not currently implemented (it doesn't seem to be very useful for the driver). There seems to be no real noexec control (i.e. for shaders) on the GPU side. All of these mappings are implicitly noexec for the firmware. Drivers are expected to fully manage per-user (TTBR0) page tables, but ownership of shared kernel (TTBR1) page tables is shared between the firmware and the AP OS. We handle this by simply using a smaller IAS to drop down one level of page tables, so the driver can install a PTE in the top-level (firmware-initialized) page table directly and just add an offset to the VAs passed into the io_pgtable code. This avoids having to have any special handling for this here. The firmware-relevant data structures are small, so we do not expect to ever require more VA space than one top-level PTE covers (IAS=36 for the next level, 64 GiB). Only 16K page mode is supported. The coprocessor MMU supports huge pages as usual for ARM64, but the GPU MMU does not, so we do not enable them. Signed-off-by: Asahi Lina <[email protected]>
Signed-off-by: Asahi Lina <[email protected]>
DRM drivers need to be able to declare which driver-specific ioctls they support. This abstraction adds the required types and a helper macro to generate the ioctl definition inside the DRM driver. Note that this macro is not usable until further bits of the abstraction are in place (but it will not fail to compile on its own, if not called). Signed-off-by: Asahi Lina <[email protected]>
Add the initial abstractions for DRM drivers and devices. These go together in one commit since they are fairly tightly coupled types. A few things have been stubbed out, to be implemented as further bits of the DRM subsystem are introduced. Signed-off-by: Asahi Lina <[email protected]>
A DRM File is the DRM counterpart to a kernel file structure, representing an open DRM file descriptor. Add a Rust abstraction to allow drivers to implement their own File types that implement the DriverFile trait. Signed-off-by: Asahi Lina <[email protected]>
The DRM GEM subsystem is the DRM memory management subsystem used by most modern drivers. Add a Rust abstraction to allow Rust DRM driver implementations to use it. Signed-off-by: Asahi Lina <[email protected]>
There doesn't seem to be a way for the Rust bindings to get a compile-time constant reference to drm_gem_shmem_vm_ops, so we need to duplicate that structure in Rust... this isn't nice... Signed-off-by: Asahi Lina <[email protected]>
asahilina
force-pushed
the
gpu/rust-wip
branch
8 times, most recently
from
May 28, 2023 14:15
69330aa
to
98f5852
Compare
asahilina
force-pushed
the
gpu/rust-wip
branch
2 times, most recently
from
June 15, 2023 04:11
51a96a2
to
89651c2
Compare
asahilina
force-pushed
the
gpu/rust-wip
branch
2 times, most recently
from
August 3, 2023 10:24
b3002e4
to
5c5368d
Compare
asahilina
force-pushed
the
gpu/rust-wip
branch
from
August 21, 2023 04:39
368ae7c
to
9a9a94c
Compare
asahilina
force-pushed
the
gpu/rust-wip
branch
2 times, most recently
from
September 9, 2023 03:33
77166d0
to
9c6bcb8
Compare
asahilina
force-pushed
the
gpu/rust-wip
branch
from
November 8, 2023 09:24
9c6bcb8
to
0f47112
Compare
asahilina
force-pushed
the
gpu/rust-wip
branch
from
November 22, 2023 05:15
0a19d0f
to
386aa6f
Compare
asahilina
force-pushed
the
gpu/rust-wip
branch
2 times, most recently
from
January 17, 2024 08:25
499c582
to
5ccaf76
Compare
marcan
pushed a commit
that referenced
this pull request
Jan 19, 2024
syzkaller report: kernel BUG at net/core/skbuff.c:3452! invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.7.0-rc4-00009-gbee0e7762ad2-dirty #135 RIP: 0010:skb_copy_and_csum_bits (net/core/skbuff.c:3452) Call Trace: icmp_glue_bits (net/ipv4/icmp.c:357) __ip_append_data.isra.0 (net/ipv4/ip_output.c:1165) ip_append_data (net/ipv4/ip_output.c:1362 net/ipv4/ip_output.c:1341) icmp_push_reply (net/ipv4/icmp.c:370) __icmp_send (./include/net/route.h:252 net/ipv4/icmp.c:772) ip_fragment.constprop.0 (./include/linux/skbuff.h:1234 net/ipv4/ip_output.c:592 net/ipv4/ip_output.c:577) __ip_finish_output (net/ipv4/ip_output.c:311 net/ipv4/ip_output.c:295) ip_output (net/ipv4/ip_output.c:427) __ip_queue_xmit (net/ipv4/ip_output.c:535) __tcp_transmit_skb (net/ipv4/tcp_output.c:1462) __tcp_retransmit_skb (net/ipv4/tcp_output.c:3387) tcp_retransmit_skb (net/ipv4/tcp_output.c:3404) tcp_retransmit_timer (net/ipv4/tcp_timer.c:604) tcp_write_timer (./include/linux/spinlock.h:391 net/ipv4/tcp_timer.c:716) The panic issue was trigered by tcp simultaneous initiation. The initiation process is as follows: TCP A TCP B 1. CLOSED CLOSED 2. SYN-SENT --> <SEQ=100><CTL=SYN> ... 3. SYN-RECEIVED <-- <SEQ=300><CTL=SYN> <-- SYN-SENT 4. ... <SEQ=100><CTL=SYN> --> SYN-RECEIVED 5. SYN-RECEIVED --> <SEQ=100><ACK=301><CTL=SYN,ACK> ... // TCP B: not send challenge ack for ack limit or packet loss // TCP A: close tcp_close tcp_send_fin if (!tskb && tcp_under_memory_pressure(sk)) tskb = skb_rb_last(&sk->tcp_rtx_queue); //pick SYN_ACK packet TCP_SKB_CB(tskb)->tcp_flags |= TCPHDR_FIN; // set FIN flag 6. FIN_WAIT_1 --> <SEQ=100><ACK=301><END_SEQ=102><CTL=SYN,FIN,ACK> ... // TCP B: send challenge ack to SYN_FIN_ACK 7. ... <SEQ=301><ACK=101><CTL=ACK> <-- SYN-RECEIVED //challenge ack // TCP A: <SND.UNA=101> 8. FIN_WAIT_1 --> <SEQ=101><ACK=301><END_SEQ=102><CTL=SYN,FIN,ACK> ... // retransmit panic __tcp_retransmit_skb //skb->len=0 tcp_trim_head len = tp->snd_una - TCP_SKB_CB(skb)->seq // len=101-100 __pskb_trim_head skb->data_len -= len // skb->len=-1, wrap around ... ... ip_fragment icmp_glue_bits //BUG_ON If we use tcp_trim_head() to remove acked SYN from packet that contains data or other flags, skb->len will be incorrectly decremented. We can remove SYN flag that has been acked from rtx_queue earlier than tcp_trim_head(), which can fix the problem mentioned above. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Co-developed-by: Eric Dumazet <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: Dong Chenchen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
asahilina
force-pushed
the
gpu/rust-wip
branch
3 times, most recently
from
May 10, 2024 12:25
22f3e5c
to
1213cf2
Compare
asahilina
force-pushed
the
gpu/rust-wip
branch
2 times, most recently
from
July 17, 2024 10:22
38f6258
to
01f6f6f
Compare
asahilina
force-pushed
the
gpu/rust-wip
branch
2 times, most recently
from
December 9, 2024 21:40
3c4cf32
to
0b31a16
Compare
asahilina
force-pushed
the
gpu/rust-wip
branch
2 times, most recently
from
January 20, 2025 20:06
4915ecf
to
f6f35c0
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.