Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How and where the assets are stored? #1

Open
8 of 14 tasks
tehKaiN opened this issue Sep 8, 2021 · 20 comments
Open
8 of 14 tasks

How and where the assets are stored? #1

tehKaiN opened this issue Sep 8, 2021 · 20 comments
Labels
documentation Improvements or additions to documentation help wanted Extra attention is needed

Comments

@tehKaiN
Copy link
Member

tehKaiN commented Sep 8, 2021

This game requires extracting all of the assets from the ROM file. How are they stored?

  • gfx assets
    • viking animations
    • enemy animations
    • hud
    • items
    • font
  • audio assets
    • music
    • sounds
  • level files
    • tilemap
    • tiles
    • scripting?
  • texts

Please don't post original disassembled code or pure extracted assets!

@tehKaiN tehKaiN added documentation Improvements or additions to documentation help wanted Extra attention is needed labels Sep 8, 2021
@tehKaiN
Copy link
Member Author

tehKaiN commented Sep 8, 2021

I've loaded ROM into YY-CHR program, loaded the palette from save state and after scrolling through bits and bytes a bit I've found that viking animation frames, items as well as whole HUD is in plain tile format, so extracting it is not a problem.

obraz

I haven't been able to find fonts, enemies and tiles this way. Then, I've opened the save state directly in YY-CHR to see if those data are available anywhere in RAM. And there I've found the font:

obraz

so it looks like the data is somewhat compressed. I'll need some help to get the decompression routine in clean room approach

  • someone has to disassemble the code and write me a specification on how the compression works
  • I'll then implement the clean decompression routine in C

@tehKaiN
Copy link
Member Author

tehKaiN commented Sep 8, 2021

There is a password table in ROM file which also comes with two bytes of data - almost each one is bigger than previous. Perhaps some kind of relative offset for loading level data?

obraz

@tehKaiN tehKaiN changed the title How assets are stored? How and where the assets are stored? Sep 8, 2021
@lintbe
Copy link

lintbe commented Oct 27, 2021

Compressed data structure for lost Viking 2

You can test with compressed data located in 0x0E2583 in rom

0x00 - 0x01 : Size of decompressed data
0x02 : Repeat Number Bits ()
0x03 - ... : Data
... - ... : Repeat Number Bits ()
... - ... : Data

Get the repeat number bits value (8bit), get the lowest bit (keep it for later) then shift right by one the repeat number bits.

If the lowest bit is high (value of 1), we copy the next value (byte) of compressed data to the decompressed as is.

If the lowest bit is low :

  • read next 2 bytes, little endian
  • CopyLoopIndex (16bit) : On those 2 bytes apply a 0x0FFF mask
  • CopyLoopSize (16bit) : On those 2 bytes apply a 0xF000 mask then :
    shift right by 12 bits
    add 0x0003
    add value of CopyLoopIndex
    apply a 0x0FFF mask
  • In a loop we set decompressed data at his current index with value of decompressed data at CopyLoopIndex.
  • We increment CopyLoopIndex and decompressed data index.
  • We loop until CopyLoopIndex is equal to 0x1000 (in this case, we set copyLoopIndex to 0 and continue looping) or is equal to CopyLoopSize

Then we return to the repeat Number and get his lowest byte and shift right by one.
We do that until repeat number is equal or less than 0.

When repeat number loop is over we read the next compressed data repeat number and repeat
the operation unless decompressed data size is equal or over decompressed data index.

decompressed data is initialized with a 0x1000 size full of 0x00 values.

@tehKaiN
Copy link
Member Author

tehKaiN commented Oct 27, 2021

great stuff! Will add support for some assets to the game asap :)

@tehKaiN
Copy link
Member Author

tehKaiN commented Oct 30, 2021

Something doesn't seem to be right.

Assuming the data is at 0x0E2583 position in ROM file, the first two bytes (uncompressed data size) decodes to 4512. That's bigger than max size of 0x1000. How's that relate to:

decompressed data is initialized with a 0x1000 size full of 0x00 values.

are the bytes 4096 and above not zeroed?

Then, the first byte containing repeat bits is 01011000 - assuming that zeros correspond to unpacking of already stored bytes, this decodes to:

File pos: 0E2585, repeat bits: 01011000
ERR: start idx 4078 should be lower than end idx 0
Decompress cmd: FFEE (1111111111101110), copying decompressed bytes from 4078 to 0 -> copyLoopSize is (15+ 3 + 4078) & 0xFFF = 4096 & 0xFFF = 0
Decompress cmd: F011 (1111000000010001), copying decompressed bytes from 17 to 35
Decompress cmd: 701C (0111000000011100), copying decompressed bytes from 28 to 38
read byte: 05
read byte: 07
Decompress cmd: C020 (1100000000100000), copying decompressed bytes from 32 to 47
read byte: 06

I've tried assuming that 0 is copying byte as-is and 1 is copying already-decoded bytes, but then it's still doesn't seem to be right. I'm betting that I'm doing something terribly wrong. ;)

@tehKaiN
Copy link
Member Author

tehKaiN commented Nov 3, 2021

I've fiddled around the code a bit and it looks like all of repeat bits should be processed instead of stopping after the right-shifted value becomes zero. I've done said change and I've successfully decoded Interplay logo tiles from the address you've specified. Could you confirm that this is the intended behavior? This way, the first repeat byte translates to:

File pos: 0E2585, repeat bits: 01011000

bit 0 (0): Decompress cmd: FFEE (1111111111101110), copying decompressed bytes from 4078 to 0
bit 1 (0): Decompress cmd: F011 (1111000000010001), copying decompressed bytes from 17 to 35
bit 2 (0): Decompress cmd: 701C (0111000000011100), copying decompressed bytes from 28 to 38
bit 3 (1): read byte: 05
bit 4 (1): read byte: 07
bit 5 (0): Decompress cmd: C020 (1100000000100000), copying decompressed bytes from 32 to 47
bit 6 (1): read byte: 06
bit 7 (0): Decompress cmd: 301C (0011000000011100), copying decompressed bytes from 28 to 34

regarding 4.5KB file not fitting in 4096-byte buffer, I guess the zeroing of first bytes is only for sake of RLE so that decompressor can assume that zeros are everywhere and use e.g. end of 4096-byte buffer as a source of repeated zeros.

@lintbe
Copy link

lintbe commented Nov 4, 2021

This seems correct to me ...

@lintbe
Copy link

lintbe commented Nov 5, 2021

Ressources offsets are located at 0x050000 ROM offset
The table seems to finish at 0x050553, this mean 155 diffrent pointer to ressource
Warning: all those ressource are not compressed, some are, some not ...
Each ressource take 4 bytes describing it's rom address

the first 2 bytes + 0x8000 are the ROM address (little endian)
the third byte + 0X8A is the ROM bank offset to be used
the fouth byte seems unused (data seems padded to be more easy to fetch)

@tehKaiN
Copy link
Member Author

tehKaiN commented Nov 6, 2021

The data you've found earlier is at 0x0E2583 in the ROM file. The only thing on the list with lowest byte 83 is 83 25 12 00. So either the list doesn't cover all assets, or the transform is a bit different - looks like maybe following:

  • get u32 from list - 83 25 12 00 becomes 0x122583
  • add 0x40000 - becomes 0xE22583

but really not sure about it. Some entries look like the offsets I've found on my own and your transform fits them more. I'll try to find and decompress some stuff tomorrow with your transform and we'll see where it takes me.

@tehKaiN
Copy link
Member Author

tehKaiN commented Nov 7, 2021

Okay, I guess this can be related to the fact that I don't exactly know how SNES memory map works.

I've set the mesen debugger to stop on ROM addr 0x0E2583 being read and it stopped, showing read from 0x9CA583 - your calc works properly here. Now I read that 0x800000 is the start of the memory shadow, so it's the same as 0x1CA583 and by looking at memory view it looks like it, but it's still 0xE8000 off to ROM file data offset. From what I understand the memory mapping can change in SNES on the fly, so the safest bet would be:

  • find the place where the rom addr 0x0E2583 is read (read breakpoint)
  • use this breakpoint across the game to read all the CPU addresses
  • for each address, search the ROM for the contents from CPU address, getting the ROM offset in the process.

Kinda tedious, but doable. Or perhaps:

  • use your calc to calculate the CPU address
  • AND by 0x7FFFFF to get rid of shadowing
  • subtract the 0xE8000 and hope for the best

I'll try to go with this approach and see where it will take me.


EDIT: and there's something not right.

The 0x84A18 from the table is followed by the 0x90218, by subtracting there's a size of 47104 - it's the same uncompressed data that I've found earlier and it's in the ROM offs 0x8CA18 - 0x98218.

By following my algorithm:

  • CPU address 0x91CA18 - ok
  • non-shadow: 0x11CA18 - ok
  • subtract: 0x34A18 - doesn't match - here it should be subtracting 0x98000

Not so easy. Any ideas other than going option 1 or disassembling and figuring out how the game maps the memory?

@lintbe
Copy link

lintbe commented Nov 8, 2021

Here is a small php method to address to rom offset that works with lost viking 2 mapping

function convertAddressToRomOffset($baseAddress): int {
    $baseAddress &= 0x3FFFFF;
    //printf(". full address 0x%06X\n", $baseAddress);

    $bank = $baseAddress >> 16;
    //printf(". bank address 0x%02X\n", $bank);

    $address = $baseAddress & 0xFFFF;
    //printf(". base address 0x%04X\n", $address);

    if($bank % 2 === 0) {
        // Pair
        $bank /= 2;
        $address -= 0x8000;
        $romOffset = ($bank << 16) | $address;
        //printf(". Rom Offset 0x%04X\n", $romOffset);
    } else {
        // Impair
        $bank /= 2;
        $romOffset = ($bank << 16) | $address;
        //printf(". Rom Offset 0x%04X\n", $romOffset);
    }

    return $romOffset;
}

@tehKaiN
Copy link
Member Author

tehKaiN commented Nov 8, 2021

Looks good! Works good!

@tehKaiN
Copy link
Member Author

tehKaiN commented Nov 13, 2021

I've managed to determine if some assets are compressed or not - I've got the rough estimate of their size as stored in ROM by subtracting their stored offset on the asset list from the offset of next asset, and then checked the first u16 (uncompressed size) to see if it's bigger than that size - if yes, there's a chance that asset is compressed.

Then, I've started throwing exception when RLE read start is further than 0x1000 and bigger than uncompressed size, because that is probably illegal. All of that filtering have left me 197 decompressed assets.

I've looked through all of them with YY-CHR and I've found out that around 40 are holding gfx - some of them were slightly corrupted (e.g. movable block gfx, pushbutton gfx). I'll look through my code to see if I'm making something stupid, otherwise there's something wrong with decompression algo or YY-CHR doesn't visualize the graphics properly.

obraz

obraz

@lintbe
Copy link

lintbe commented Dec 13, 2021

There is a condition that I missed to explain :

We loop until CopyLoopIndex is equal to 0x1000 (in this case, we set copyLoopIndex to 0 and continue looping) or is equal to CopyLoopSize

@tehKaiN
Copy link
Member Author

tehKaiN commented Dec 13, 2021

I've checked and it on files listed above and it works! thanks! I'll now review all the remaining assets and see if something else messes up.

@tehKaiN
Copy link
Member Author

tehKaiN commented Dec 13, 2021

I've successfully decoded most of NPC and enemies gfx - tiles are still elusive for me, but it's perhaps wrong preview tool and/or palette.

I've just discovered that there are 47 files starting with 00 01 f8 00. I'm gonna treat it as separate file type and add the file extension .1F8 to differentiate them from other files. I still don't have a clue what they contain.

@tehKaiN
Copy link
Member Author

tehKaiN commented Jun 12, 2024

Last time, I've got stuck at decompressing larger images, e.g. continue screen chars. Images got more corrupted the bigger they were.

obraz

Found the bug! The decompression routine was flawed. I took a looksie at decompression routine for LV1 at https://github.com/RyanMallon/TheLostVikingsTools/blob/master/liblv/lv_compress.c#L76 and apparently, it used very similar algorithm, but whenever copying the past bytes occured, it used the separate 4096-long circular lookup buffer. After doing the same, it worked like a charm.

(different palette btw but image looks less distorted)
obraz

I wonder what else can be recycled from LV1. Perhaps even its virtual machine? That would be big and allow running original logic on an Amiga - be it by interpreting it on the fly, or by compiling to native code ahead of time.

@tehKaiN
Copy link
Member Author

tehKaiN commented Jul 14, 2024

So far I've managed to decode most of the data files, including palettes, tilesets, tilemaps, level definitions.

tmp9y4xpo40

Three big areas remain unsolved:

  • sound effects and music
  • dialogue / help texts
  • entity scripts

I've found sounds using the snessor and BRR finder, but that method guesses where the samples start and stop, and I have no hard references on neither where each sound is used nor what playback parameters should be set. Also, it would be amazing to find the code responsible for music playback (e.g. custom tracker/sequencer)

Regarding scripting, I thought that perhaps LV2 uses virtual machine and bytecode-based scripts as the PC version of LV1 (https://ryiron.wordpress.com/2017/02/01/finding-the-lost-vikings-reversing-a-virtual-machine/) but that doesn't seem to be the case, and other platform versions of LV2 use entirely different file organization/engine. So it looks like to make game logic very accurate, it would require disassembling some 65C816 code.

Each level has a list of objects to be spawned, and each object has: position, center point, class id, and two unknown u16 values: unk1 and unk2. The unk2 value seems to be behavior argument/parameter for object, because e.g. pickups share the same class, but unk2 corresponds to the item kind, e.g. 1 is a shield pickup, and 2 is a bomb, etc. Same with switches/doors/keyholes - unk2 indicates a link between them.

There are around 164 object classes to reverse, and according to LV1 reversing docs, they might share scripts (e.g. patrolling enemies all have different classes, but their behavior is mostly the same) as it's determined by one of the class definiton field. The other fields might be: collision rectangle dimensions and index of datafile containing graphics frames.

Even finding the class definition list would be a big help because it would show relations to gfx files as well as true size boxes - I currently guesstimate them using center point sizes, which doesn't look to be always correct. To find that, it could be done as following:

  • in main menu, enter STRT code
  • this should decompress and load file index 28 containing the first level def (the index is stored next to the password in ROM file, so that position could be breakpointed)
  • inspect how level def is loaded, with emphasis on object spawning
  • inspect what the game does with class id 31 (game's help box) at this point, where does it find info on its size/behavior/graphics

This would also allow finding where the game texts are stored, because the help box triggers given text on interaction.

@tehKaiN
Copy link
Member Author

tehKaiN commented Jul 24, 2024

The game seems to have very little code (see screenshot). Perhaps, in the end, I'm not extracting all the files and they are split into multiple packs, and behavior bytecode is still lurking somewhere? Or perhaps entity behaviors for everything except for vikings are heavily recycled, thus having very little footprint.

I've ran the full PAL version of the game through Diztinguish in case someone wants to take a stab at disassembling it and writing a spec for missing features.

obraz

lv2_diz.zip

@tehKaiN
Copy link
Member Author

tehKaiN commented Oct 1, 2024

One thing that wasn't taken into account in the list of stuff to find are texts - I couldn't find them in plain encoding anywhere in the ROM, neither in ASCII nor in indices corresponding to the game's font, so there's some compression stuff in the works.

Here's what I got from Runic in sneslab:

Okay well $2A says the buffer starts at 7E3280 (and that is hardcoded actually, whereas a lot of other stuff is set from a table or header somewhere)
$1A8B has Y $0002 in $30
Data is loaded from pointer at $64 which comes from $0DB9 table (anther ptr table at $0E09 for the same whatever)
$BE is only other ptr which pulls into $26 ptr along with $C0 (iteration was $9FB3BF)

81C031
$19F364 is loaded, modified, then put in X.... that doesn't makes sense that's a constant...
1FB3BF
$C2 is used to select between these two tables:
9F366
9F3E4
$26 might be source file (1FB3BF) seems
1fb3c0
1fb3bf

there is one static table used by $26 at:
0A8000
oh no
src: 7E31000
dst: 7E32000
that means sometimes the file/buffer ptr is sometimes contained within another file probably? unless it calculates it

7E3000 is another static src
02AAAA was a calculated src (so we do have calc'd src
1F8000 table from $BE is stored in $26 (this is the 1FB3BF)
16B418

do a calc at 00F467 to add ROM offset and then store in $26 (with bank $02 hardcoded)
$98 x $9c = index
index oh okay well there we go
Not positive. But I think I covered all the sources. But bank 2 is a table of src ptrs at the start.
And that seems like a likely source (either that or the fact that the ptr can be referenced inside another file potentially)
1FB3BF is the only other src ptr that occurs as offten as something from the start of bank 2
(often does not mean I found "here be strings", but let me know if you didn't find the data blobs pointed to by those table)

I don't understand the half of it, except for the bank 2 thingy. If I'm correct this corresponds to address 0x10000 in ROM file and apparently there are some 25 byte-long entries in there and there are even some candidates for pointers to the data:

obraz

The marked address 83077301 converts to ROM position B7301 and that collides with previously detected data, so it can't be valid, but 00839773 is at ROM position 19773 and that's better result because of no collisions.

Further finding of close addresses:

83B404 -> ROM 1B404
83B443 -> ROM 1B443

Gives some sensible bounds for entry at 1B404 which is apparently 63 bytes long. How to decode it is a mystery to me, and most probably it's compressed somehow.

My progress of reading those texts directly form ROM is hindered for now, so I guess I need to rip them from RAM somehow for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants