Add initial support for Core Audio Format / CAF files #232

irh · 2023-09-19T15:38:02Z

This PR introduces support for decoding .caf files.

The implementation is derived from the spec, and tested against a local collection of audio files that use various encoding formats.

I haven't added decoding support for all chunk types, but hopefully this PR acts as a good starting point for further development?

Things seem to be working correctly, but please let me know if I've misunderstood some aspect of the format reader API, I'll be happy to make changes as needed.

dedobbin · 2024-01-07T19:26:57Z

I tested it with files from https://filesamples.com/formats/caf and https://www.dwsamplefiles.com/download-caf-sample-files/.
They all play correctly in symphonia-play, audio seems perfect when compared to Audacity. Track length is also correctly displayed.
Also all pass symphonia-check with default settings.

Code also looks good to me.

Good stuff.

pdeljanov · 2024-01-09T02:01:16Z

Thanks submitting this, I can see a lot of work was put into it and we'd be happy to have it.

Generally, I think it's okay to pull in a decoder/demuxer with a minimum set of functionality. It looks like you have PCM encapsulation down, so this would be a good start.

Unfortunately, I have no knowledge or experience with CAF, so would you be willing to address any bugs that come up with it (we could assign them to you)? Also, would you like to continue extending the support to other encapsulated codecs?

irh · 2024-01-09T09:47:54Z

Hi @dedobbin + @pdeljanov, thanks for taking a look, and I'm glad to hear that you'd be happy to accept the PR.

Also, would you like to continue extending the support to other encapsulated codecs?

Compressed codec support is implemented (see the last commit), but maybe I misunderstood something?

What's missing from this implementation is support for metadata chunks like markers or annotations, but I wasn't sure which features would be available through Symphonia, and I figured that support should be added with specific use cases in mind.

Unfortunately, I have no knowledge or experience with CAF, so would you be willing to address any bugs that come up with it (we could assign them to you)?

Sure, I'd be happy to take a look if/when bugs come up, and I can review PRs with feature additions if you like.

symphonia-format-caf/Cargo.toml

irh · 2024-01-09T20:22:59Z

@dedobbin I rebased the branch onto the latest master because it's been a few months (things still seem to be working correctly!), and I also fixed some clippy warnings.

pdeljanov · 2024-01-09T22:48:05Z

Just a heads up, the format check is failing because our config requires the nightly toolchain to adjust the bracket style. If you install the nightly toolchain you can use it by adding +nightly after cargo, e.g., cargo +nightly fmt.

Also, would you like to continue extending the support to other encapsulated codecs?

Compressed codec support is implemented (see the last commit), but maybe I misunderstood something?

What's missing from this implementation is support for metadata chunks like markers or annotations, but I wasn't sure which features would be available through Symphonia, and I figured that support should be added with specific use cases in mind.

Got it! I probably just misunderstood what you mean't!

irh · 2024-01-10T11:05:46Z

@pdeljanov I've run cargo +nightly fmt and pushed.

pdeljanov

I've completed a more thorough review of this PR. While going over this I used Apple's specification as a reference.

Overall, I think this is very nicely done! Hopefully we can get this in very soon. 😄

Most of my concerns are related to data validation. The media stream should be considered untrustworthy and even malicious. Therefore, it's important we don't blindly trust what is being read. Less sinister scenarios could include damaged files. Ultimately, we don't ever want to panic or get thrown into an infinite loop due to bad input. Consider checking out the fuzzing setup used for MP3 if you really want to stress your code. That being said, I'm not going to gate this review on fuzz tests.

symphonia-format-caf/src/chunks.rs

pdeljanov · 2024-01-11T03:13:45Z

symphonia-format-caf/src/chunks.rs

+            format_id,
+            bytes_per_packet: reader.read_be_u32()?,
+            frames_per_packet: reader.read_be_u32()?,
+            channels_per_frame: reader.read_be_u32()?,


It is illegal for channels_per_frame to be 0.

There's a check for the channel count being zero in CafReader::read_audio_description_chunk(), but I think it makes sense to report an error as soon as possible. Does it make sense to leave the extra check in place (I guess the alternative is replacing it with unreachable!())?

Either way works for me.

In such cases, I think I'd only strongly prefer the latter over the former if the additional check was in a hot code path. Mainly because we already setup the precondition that the field will never be 0 so why waste the cycles? However, I don't think FormatReaders are ever really "hot" so you can choose whatever you think is best for maintainability.

I think for maintainability replacing the error with an assertion makes sense, otherwise the maintainer is left wondering if there's ever a case where the first check could pass and then the second check could fail. There's no such case, so an assertion communicates that more clearly.

symphonia-format-caf/src/chunks.rs

symphonia-format-caf/src/demuxer.rs

pdeljanov · 2024-01-11T03:58:00Z

symphonia-format-caf/src/demuxer.rs

+                }
+                else {
+                    error!("Invalid packet index: {}", current_packet_index);
+                    return decode_error("Invalid packet index");


We have some conventions regarding the messages used for error message and logging. Use only use lowercase for both types.

For errors, since the source module is not known, please prefix the format or decoder name. In this case, please use the prefix caf: . You don't need to do this with the logging macros because they record the module emitting the log.

Please adjust accordingly.

Done, although I couldn't build with 1.53 to check that I haven't used any other post-1.53 features (there's a problem with finding a supported version of regex).

Ah, I see. The regex crate is being pulled in by env_logger which is used by symphonia-play and symphonia-check, but not the main libraries. I tried building with 1.53.0 though, and it seems the log crate now requires 1.60.0, so this is unfortunate. We can ignore this then.

symphonia-format-caf/src/demuxer.rs

irh · 2024-01-12T21:01:25Z

Thanks for the review @pdeljanov, I've followed up on your feedback and pushed the changes, although I haven't double checked against my test files yet. I should have time to check them out over the weekend, I'll follow up when I'm done.

pdeljanov · 2024-01-13T01:40:22Z

symphonia-format-caf/src/chunks.rs

+        let chunk_size_u64 = chunk_size as u64;
+        let edit_count_offset = size_of::<u32>() as u64;
+
+        if chunk_size != -1 && chunk_size_u64 < edit_count_offset {


Suggestion: I feel this check is a bit too clever and harms readability a bit. Atleast until one recalls signed two's complement. Consider doing it the more "naive" way.

You're right, I've tried to simplify this a bit.

pdeljanov · 2024-01-13T01:46:11Z

symphonia-format-caf/src/chunks.rs

+            format_id,
+            bytes_per_packet: reader.read_be_u32()?,
+            frames_per_packet: reader.read_be_u32()?,
+            channels_per_frame: reader.read_be_u32()?,


Either way works for me.

In such cases, I think I'd only strongly prefer the latter over the former if the additional check was in a hot code path. Mainly because we already setup the precondition that the field will never be 0 so why waste the cycles? However, I don't think FormatReaders are ever really "hot" so you can choose whatever you think is best for maintainability.

symphonia-format-caf/src/chunks.rs

pdeljanov · 2024-01-13T02:04:43Z

Hey @irh,

Thanks, looks good to me! We can merge this when you're ready.

Before then, you should also add yourself to the CONTRIBUTORS file. Also, if you can, please update the feature tables in /README.md, /symphonia/README.md, and /symphonia/src/lib.rs. Since this is the initial release, symphonai-format-caf should have a Good rating.

irh · 2024-01-15T08:16:34Z

OK thanks for taking the time to review this and for the great feedback, I think we're good to go 👍

pdeljanov · 2024-01-15T23:35:58Z

Sounds good @irh! Do you mind re-basing this to resolve the conflicts?

irh · 2024-01-16T08:05:32Z

Do you mind re-basing this to resolve the conflicts?

Done 👍

pdeljanov · 2024-01-16T13:44:10Z

Merged! Thank you for the work you put into this!

dedobbin reviewed Jan 9, 2024

View reviewed changes

symphonia-format-caf/Cargo.toml Outdated Show resolved Hide resolved

irh force-pushed the caf-support branch from 3dc1566 to c6dd64a Compare January 9, 2024 20:21

pdeljanov requested changes Jan 11, 2024

View reviewed changes

irh force-pushed the caf-support branch from 455676c to 1f07903 Compare January 12, 2024 20:56

pdeljanov approved these changes Jan 13, 2024

View reviewed changes

irh added 15 commits January 16, 2024 08:59

Fix a typo

b4e3322

Add a placeholder CAF demuxer

e76b801

Add support for mono & stereo PCM data

e1b909e

Add support for multichannel audio

0fd819d

Add support for compressed codecs

142e9ac

Update author list

370aabc

Follow clippy's advice

d499f7d

Run cargo fmt using a nightly build

66ee5e0

Add missing copyright headers

bf397ae

Fix channel bitmap layout tag matching

4381798

Update logging and error strings to match conventions

5fbafca

Add additional chunk size checks

a8b51e2

Check for invalid fields when reading the audio description chunk

a7c725e

Fix the 'format is compressed' condition

69a719a

Move the AudioData chunk size check earlier to avoid reading too far

7db0221

irh added 6 commits January 16, 2024 08:59

Add additional validation checks to the Packet Table chunk reader

3466ee0

Avoid endless loops when reading variable-length integers

bdbb6f0

Add size checks for the all remaining chunks

0ff61a2

Simplify the chunk size check in AudioData::read()

e8144ff

Update the readme feature tables and the contributors list

10f6062

Replace a redundant channel count check with unreachable!

70970eb

irh force-pushed the caf-support branch from 8d62d27 to 70970eb Compare January 16, 2024 08:01

pdeljanov merged commit abb497b into pdeljanov:master Jan 16, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add initial support for Core Audio Format / CAF files #232

Add initial support for Core Audio Format / CAF files #232

irh commented Sep 19, 2023

dedobbin commented Jan 7, 2024

pdeljanov commented Jan 9, 2024

irh commented Jan 9, 2024

irh commented Jan 9, 2024

pdeljanov commented Jan 9, 2024 •

edited

Loading

irh commented Jan 10, 2024

pdeljanov left a comment

pdeljanov Jan 11, 2024

irh Jan 12, 2024

pdeljanov Jan 13, 2024

irh Jan 15, 2024

pdeljanov Jan 11, 2024

irh Jan 12, 2024

pdeljanov Jan 13, 2024

irh commented Jan 12, 2024

pdeljanov Jan 13, 2024

irh Jan 15, 2024

pdeljanov Jan 13, 2024

pdeljanov commented Jan 13, 2024

irh commented Jan 15, 2024

pdeljanov commented Jan 15, 2024

irh commented Jan 16, 2024

pdeljanov commented Jan 16, 2024

Add initial support for Core Audio Format / CAF files #232

Add initial support for Core Audio Format / CAF files #232

Conversation

irh commented Sep 19, 2023

dedobbin commented Jan 7, 2024

pdeljanov commented Jan 9, 2024

irh commented Jan 9, 2024

irh commented Jan 9, 2024

pdeljanov commented Jan 9, 2024 • edited Loading

irh commented Jan 10, 2024

pdeljanov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

irh commented Jan 12, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pdeljanov commented Jan 13, 2024

irh commented Jan 15, 2024

pdeljanov commented Jan 15, 2024

irh commented Jan 16, 2024

pdeljanov commented Jan 16, 2024

pdeljanov commented Jan 9, 2024 •

edited

Loading