Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FAT32] Avoid indexed access with base in IO range on 65C816 #355

Merged

Conversation

mooinglemur
Copy link
Collaborator

Bank area blockwise reads and writes which wrap RAM banks in the middle of the read or write operation formerly used indirect indexed mode in $9Fxx,y to continue the operation, where .Y would start with the offset to put the first byte at $A000.

This change prevents that condition on the 65C816.

Backstory:

On the 65C816, when doing indexed reads and writes, the cycle immediately before the valid operation is a read of the indexed offset without carry.

For instance

ldy #$80
lda $9f80,y

For the lda instruction on the 65C816, just like on the original 6502, the CPU will first add .Y to the address without carry, and do a read cycle. This will read from $9f00. If the page wrapped, the CPU will do another cycle, and on this the carry of the wrap will be added to the effective address, and will do the actual read from $a000 and continue execution.

In the FAT32 code, this is done via indirect, but the effect is the same

lda #$9f
sta ptr+1
stz ptr
ldy #$80
lda (ptr),y

The lda instruction on the '816 will dereference ptr, and then do the exact same read cycle as above.

For stores, the process is similar, but slightly different

ldy #$80
lda #$69
sta $9f80,y

On the first cycle, just like above .Y will be added to the effective address without carry, and the CPU will always do a read cycle on the resulting address, even if the page did not wrap. On the next cycle, any carry will be applied and the CPU will do a write cycle on the effective address.

Indirect has the same cycle pattern after dereferencing.

Of note, an indexed write will always have that extra read cycle, otherwise in the case of a page wrap, the extra cycle will have affected the wrong address. The CPU assumes that the read doesn't have a side effect.

Copy link
Collaborator

@Fulgen301 Fulgen301 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before the branch, the code loads $9F into fat32_ptr+1, then backs it up in @816_9f_page, overwrites it with $A0, and later restores it back to $9F. This feels unnecessary.

Regarding your question on Discord, I'd just keep the new version that works with the 65C816 for both CPUs - the loop is faster, and it reduces the code complexity significantly.

@mooinglemur
Copy link
Collaborator Author

For the first point, it made sense to set fat32_ptr+1 if we're going to use the old loop, but if we always use the new loop after wrap, then there''s no good reason. We just set it to #$9f at the very end before exiting the section. I'll try to update it later today.

@mooinglemur mooinglemur merged commit 3ee434c into X16Community:master Aug 28, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants