Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add advanced stem loading COs #13268

Merged
merged 12 commits into from
Nov 26, 2024

Conversation

acolombier
Copy link
Member

@acolombier acolombier commented May 23, 2024

Currently, this goes with a mix of extending the existing load signal/slot and creating a second one, called subsequently. Is there any preference? Should I change the implementation to use only one? (only extending existing signal or only subsequently emit)

Here is documentation, as per in the EPIC (#13116)

Pre-mixed stereo load

[ChannelX],load_selected_track_stems can be used as followed:~

  • 1 (0b0001) means only the first stem is selected for stereo pre-mixing.
  • 13 (0b1101) means all stem but the second are selected for stereo pre-mixing.
  • 15 (0b1111) means the entire track as stereo. Since it is impossible to load that track without invalidating the replaygain due to the missing DSP support, all the four stem are currently mixed upstream. In the future, the pre-mastered track will be used.

Note that using 0 is not possible as this is the "idle state" of the CO. 0 also means default (load as stem deck on primary decks, or load the pre-mastered track on sampler/preview), which is equivalent to [ChannelX],LoadSelectedTrack

Individual bit mask will be reflected in the stem_Y_selected CO.

@acolombier acolombier changed the title Feat/add advanced stem load cos feat: add advanced stem loading COs May 23, 2024
@acolombier acolombier mentioned this pull request May 23, 2024
8 tasks
@acolombier acolombier force-pushed the feat/add-advanced-stem-load-cos branch 2 times, most recently from 30fd666 to 2f2a22a Compare May 23, 2024 19:42
@acolombier acolombier force-pushed the feat/add-advanced-stem-load-cos branch 2 times, most recently from b4db8c8 to fdd8810 Compare June 3, 2024 10:56
@acolombier acolombier force-pushed the feat/add-advanced-stem-load-cos branch 2 times, most recently from a36c6a0 to 31becb9 Compare June 29, 2024 15:57
@acolombier acolombier mentioned this pull request Jul 25, 2024
@JoergAtGithub JoergAtGithub added this to the 2.6-beta milestone Jul 28, 2024
@acolombier acolombier force-pushed the feat/add-advanced-stem-load-cos branch from 31becb9 to fab6880 Compare July 30, 2024 21:34
@acolombier
Copy link
Member Author

Please note that UI changes (1st and 3rd commits) will be moved to #13515 once this PR is ready to be merged!

@acolombier acolombier force-pushed the feat/add-advanced-stem-load-cos branch 2 times, most recently from 2fc47ba to 868a67e Compare July 31, 2024 07:45
Copy link
Member

@daschuer daschuer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some doubts if this is consistent with the other PRs and the overall user experience.

The users sees only the track in the library which has four stems. The use can now load a single stem form a file or load the whole stem file and mute all 3 of them.
In the first case they cannot add a second stem in the second case they suffer the penalty of stem processing. They need a brief understanding of the engine interna to make the right decisions during mixing which could be limiting.

Can we join both? How can we allow to seamlessly switch between these cases.

A typical use case is probably:

  • Play/Cue a full track
  • Notice unusable part
  • Switch to stem mode an silence that part
  • Instant double the track in a second deck
  • Bring the silenced part in using an effect

How can we allow this using the new single stem mode introduced here behind the scenes?

src/engine/cachingreader/cachingreader.cpp Outdated Show resolved Hide resolved
@acolombier
Copy link
Member Author

How can we allow to seamlessly switch between these cases.

The first scenario is the only way on non-primary decks (sampler and preview), while second is the default for primary deck. realistically, I only added the second option for primary deck for feature parity and enabled that advanced capability, or workaround for light platform which would struggle with more than one stem deck + rubberband (eg. raspberrypi)

How can we allow this using the new single stem mode introduced here behind the scenes?

By default, the load process will still load the full stem file. Loading an individual stem should be reserved for "aware users". A typical use case is to load vocals or drum only in a sampler, or a deck if you want more transport control.

By the sounds of it, the use case you are describing (clone switching between individually loaded stem vs full deck with muted tracks) are falling under the the split use-case we've discussed in the epic, or am I reading this wrong?

@daschuer
Copy link
Member

Ah, OK I forget about the sampler use case.

Than let's think about a way to gerearize this topic. Since it is handled by the same engine class anyway.

How is the transition from single stereo to stem now done? Is there way to keep the transport going during the switch? A switch from the full track to a stem mix is probably the same as a switch from a stem mix to one of the other stems. Do we need to consider something like "load as karaoke" where we process three of the stems?

I am not demanding to implement it here. I just like to brainstorm possible future solution to have the Co interface proved for that.
My feeling is that current one is too static.

Idea: can we use the mute controls for selecting the loaded stems?

@acolombier
Copy link
Member Author

How is the transition from single stereo to stem now done?

Currently there is no way transition "live" for a stem deck to a stereo deck with a track. With that current feature, can chose the load a track in an empty deck. If we think of a case where one would want to switch from a stem deck to a stereo deck, they would need to use an empty deck and handle the transport synch.
In the future, we could consider the ability to do this live on a deck, but I would emit some reserves as of whether or not this is possible. Particularly, I'm not sure how we would handle the CachingReader sync, and the arming of the RubberBand. I think it safer to expect that such a transition should always be done on another deck.

Is there way to keep the transport going during the switch?

It would be fairly trivial to implement to some kind of clone_as_stereo that makes the transport cloning as well, as the existing cloning does. I'm not planing to implement cloning feature as part of this PR or the GSoC as I have recently less time (sadly back to a full-time job...), but I'm hoping to still get something together before the end of Q3.

Do we need to consider something like "load as karaoke" where we process three of the stems?

I think it would be great to even go further an allow loading a set of stem pre-mixed to stereo (eg. 2+3, 1+3+4, ...), but same as above, I think it is big enough to split it as a future feature.

Note that in case not all the above feature wouldn't be ready for 2.6 and we would like to wait for completeness before releasing, the STEM build option should still be working and allow us to keep that feature behind curtain till 2.7. Personally, I hope we can release what we have in 2.6 but it's purely subjective and selfish opinion! :)

Idea: can we use the mute controls for selecting the loaded stems?

This is already how we were planning to make the split_* CO work. Say you are playing 1+2+3+4 on A, if you split 3, you would end up with 1+2+4 on A with 3 muted, and B playing 3 but 1+2+4 muted, allowing you to play with the transport between stem track, while still allowing you eventually to "merge" back to a unify transport (by unmuting 3 on A, or 1+2+4 on B)
Now that would require a fairly powerful processing power if we consider RubberBand-ing 8 stereo channels, so I guess we could add a variant (or a user preference?) so splitting would lead to having 1+2+4 on A with 3 muted, and B playing just 3 as a stereo deck (meaning you can only merge back to A and cannot add new stem tracks to B's transport)

@JoergAtGithub
Copy link
Member

Works as expected now! Thank you!

@acolombier
Copy link
Member Author

Friendly ping @daschuer, in case you've missed the notification :)

Copy link
Member

@daschuer daschuer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have left some comments. Unfortunately a conflict has developed.

src/engine/channels/enginedeck.cpp Show resolved Hide resolved
// not.
constexpr int kMaxSupportedStems = 4;
#ifdef __STEM__
constexpr uint kNoStemSelected = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it is a part of a 1 based stem index. But It is a value of the stem mask, right? I think it would be the best to introduce a type for stem mask, may an enum or even a complex type with getteer an setter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an enum + QFlag - this certianly makes the code cleaner!

Comment on lines 41 to 52
#ifdef __STEM__
m_loadSelectedTrackStems =
std::make_unique<ControlPushButton>(ConfigKey(group, "load_selected_track_stems"));
connect(m_loadSelectedTrackStems.get(),
&ControlObject::valueChanged,
this,
[this](double value) {
if (value >= 0 && value <= 2 << mixxx::kMaxSupportedStems) {
emit loadToGroup(m_group, static_cast<uint>(value), false);
}
});
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still in doubt that this will ever get used. Let's remove this until we have a use case for this along with a mapping that actually makes use of it. We may set up a n experimental branch, that includes this CO and the other ideas we have discussed here, that we have a test build ready and not block mappers on this. What do you think?

Suggested change
#ifdef __STEM__
m_loadSelectedTrackStems =
std::make_unique<ControlPushButton>(ConfigKey(group, "load_selected_track_stems"));
connect(m_loadSelectedTrackStems.get(),
&ControlObject::valueChanged,
this,
[this](double value) {
if (value >= 0 && value <= 2 << mixxx::kMaxSupportedStems) {
emit loadToGroup(m_group, static_cast<uint>(value), false);
}
});
#endif

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something I'm actually planning to use in the S4 mapping. Due to performance reasons, loading the whole stem track to just play a track accapella or instrumental is very expansive.
Often in my set, I would want to play a track instrumental, with the vocals of another track, this means having to load 8 stereo track when 2 would have been enough. This makes is particularly unusable as I will likely use a third, sometime a fourth deck.
As we are adding that feature on the UI, I think it is wrong to not allow that to be mapped - I certainly would find it a bummer if I have to pick my mouse and keyboard every time I want to go over that usecase.
What would be the downside to get this control in?

Copy link
Contributor

@Eve00000 Eve00000 Sep 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes 2 of us using stems like that (using a cappella like that)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can very much acknowledge the use case. I am just In doubt that exactly this implementation is particularly useful.
But I might be wrong. So how about postponing this question to the mapping PR? Or do you want to outline some details here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This CO is the essence of this PR - and the use case is obviously there now!

I am therefore against removing this CO from this PR!

Since all the arguments have already been exchanged, I think it's time to start a vote of the Mixxx Core developers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added documentation of the CO in the description to help summarising the proposed behaviour

@acolombier acolombier requested a review from daschuer September 29, 2024 20:15
Third = 0x4,
Fourth = 0x8,

None = 0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does None mean the stereo track?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like. At least it is zero if we have no stem file, right. Did you consider to implement #13700 right away.
Is there a use case for "None". Probably in future so we may want to shift Everything by 1 to free up 0x01 for stereo and model the Stem file layout?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loading the premastered stereo track is clearly out of scope of this PR!
But it's possible to assign any unused value for it. The current implementation of 2^stemNo ensures simple understandable and unique stem masking accross the entire code base.
Therefore I think this is the best possible code design here!

Copy link
Member Author

@acolombier acolombier Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does None mean the stereo track?

None means no stem track selected, so the deck will be loaded as stem deck, except for samplers, as they cannot load stems. All means load the stereo track

Did you consider to implement #13700 right away.

Please see this comment and this one. TL; DR: loading the pre-mastered stereo is not possible yet.

Probably in future so we may want to shift Everything by 1 to free up 0x01 for stereo and model the Stem file layout?

I don't think we need to - AFAIK, there is 16 usecases, all covered by the flag:

  • 0: load as stem deck
  • 1..14: load as stereo with only the selected stems
  • 15: load as stereo (currently mixed manually to address DSP limitation, using the pre-mastered track when we have the DSP)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If "None" load as "stem deck" for a stem file and the stereo track in case of a normal file, we need to rename it to "Default" or such.

Loading the stereo track IS a valid use case.
Will it be used when all stems are selected or will we than just select all stems?

This question is uncritical if we remove the exposal to the co interface from this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If "None" load as "stem deck" for a stem file and the stereo track in case of a normal file, we need to rename it to "Default" or such.

Sounds fair, I will change that

Loading the stereo track IS a valid use case.

it is

Will it be used when all stems are selected or will we than just select all stems?

Currently, option 2 because no DSP support. In the future Option 1.

@@ -323,6 +323,12 @@ SoundSource::OpenResult SoundSourceSTEM::tryOpen(

AVStream* firstAudioStream = nullptr;
int stemCount = 0;
uint selectedStemMask = params.stemMask();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
uint selectedStemMask = params.stemMask();
mixxx::StemChannelSelection selectedStemMask = params.stemMask();

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure though. Maybe keep uint for now. At one point we need integer.

@daschuer
Copy link
Member

daschuer commented Oct 1, 2024

In the future, the pre-mastered track will be used.

I think changing the meaning behind the scenes in future is not optimal, because both versions will sound different. In terms of the CO interface, we should be able to express both. We may do any of these in the GUI or the mapping. We may also add a "premastered" enum that is temporary implemented as the sum of all stems.

@acolombier
Copy link
Member Author

acolombier commented Oct 1, 2024

I think changing the meaning behind the scenes in future is not optimal, because both versions will sound different.

No they shouldn't sound different, except if our DSP support is poor or incomplete, in which case it will mean we are not ready to support pre-mastered track anyway.

If we really feel strongly about this, then we should add a new CO in the future (e.g load_selected_track_premastered) because using some kind of irrelevant mask value on load_selected_track_stem is misleading anyway

@daschuer
Copy link
Member

daschuer commented Oct 1, 2024

No they shouldn't sound different,

But they can. This is out of our control, we play what found in the file.

except if our DSP support is poor or incomplete, in which case it will mean we are not ready to support pre-mastered track anyway.

Since we have not yet a DSP stage we are already at that point when playing all stems together. I consider the "pre-matered" track as the "truth" and the stems more or less good reproductions. But that depends on wether the stem file was created during mastering or later using AI.

If we really feel strongly about this,

I do, but not yet about the final CO interface. Here I am only concerned about the internal interface, that we may expose one by one as proposed here in a CO.

My idea is to have an universal 'TrackLoad(track, mode)` command that is universal usable.

I like by the way the idea to combine track and mode into a single structure very much. I have not yet proped it because I actually want to have this merged soon as a minimum step, but this would allow to add getter and setter to implement a clean API on top of an obscure encoding of the Stem mask internals.

@daschuer
Copy link
Member

daschuer commented Oct 1, 2024

How about introducing a stem independent loading mode?

enum class LoadingMode {
    Auto = 0x00000000; // On the fly mixing (StemDeck) mode in case of stem files or the first stream found 
    // Next accordingt to 0x01 << streamIdx 
    Stream0 = 0x00000001;  alias Premasterd   
    Stream1 = 0x00000002;  alias Drums
    Stream2 = 0x00000004;  alias Bass
    Stream3 = 0x00000008;  alias Accompaniment
    Stream4 = 0x00000010;  alias Vocals
    Stream5 = 0x00000020;  
    ....
    UseStemDeck = 0x10000000;  
}

This would get rid of the -1 here:

if (selectedStemMask && !(selectedStemMask & 1 << (streamIdx - 1))) {

And opens access ato any number of streams in a certain container format

Auto equals Stream0 in case the first stream is audio.
Auto equals UseStemDeck | Drums | Bass | Accompaniment | Vocals in case of stem files.

In case of instant doubles from a StemDeck, Vocals is muted and the cloned deck is loaded as UseStemDeck | Vocals which allows to loop vocals independenly and on-the-fly adjust the mix.

In case of a vocal sample you may load the sampler just with Vocals.

An alternative instant double feature from a StemDeck may use Premasterd which allows transitions form a lossy CPU hungy stem mix to the lossless version of the same track.

I have no demand for details of my proposals, I have just considered possible modes we need when loading a track to a player and how express them as a single double precision number (Limited to 32 bit in my propsal) .

What do you think?

For this PR we even continue with the currenty implemnetd proposal as long as it is not exposed as CO.

@acolombier
Copy link
Member Author

But they can. This is out of our control, we play what found in the file.

But the spec says it should. Now I agree that it will never be byte perfect, but it sounds wrong to add CO behaviour to factor that in. Surely, we wouldn't want want mapping developer to have some way to decide if the STEM file is actually spec compliant or not? Looking at my controller, it would also feels weird to add that "load my track in stereo but don't use the pre-mastered stream" logic in here.
Arguably, I could make the same point about individual stem tracks which could turn out to be wrong (e.g when creating the stem with Stem Creator, accidentally adding twice the vocals)

It sounds like, when we support DSP thus pre-mastered track load, we could add a setting to allow to still use manually mixed stereo track instead of the pre-mastered, this way if someone using Steam Creator in a wrong way or another tool that doesn't quite match the spec, they can still find the "No DSP mitigating behaviour"

But that depends on whether the stem file was created during mastering or later using AI.

That's true - the replaygain tends to do a fairly good job on mitigating that issue tho, at least that was the case when testing with the NI Stem pack (non AI stem)

My idea is to have an universal 'TrackLoad(track, mode)` command that is universal usable.

I think that makes sense, but I'm not sure why it would be invalidating the proposed CO?
The usecase you are listing are great and make sense but they require a lot more work, and also include more logic when mappings want to use it. In the meantime, the suggested usecase with UseStemDeck | .. is already possible with subsequent CO calls:

..],LoadSelectedTrack=1
..Stem4],mute=1

It also includes a few edge case to handle (e.g Stream0 can never be used with any other value) which to me feels like we are over-fitting usecase on that CO

I would just add with the existing CO, the terminology stem is used here, so adding the concept of stream in it feels like a misleading decision for the user

@daschuer
Copy link
Member

daschuer commented Oct 1, 2024

Looking at my controller, it would also feels weird to add that "load my track in stereo but don't use the pre-mastered stream" logic in here.

I agree that we need certein limitations on the mapping and GUI to make it reasonable. My point is that the load mode is used deep in the software stack where we have STEM files and other multi stream file formats. The API in that state should be cover all and not be limited by certain recommendedn but not asserted assumptions.

Mixing form the All-Stem mode to the Premastered mode is a solution for the lossy format and CPU issue. So an underlying API should be able to deal with to be open for inuitive solution, which my proposal is not. Just a technical detail that allows to transport all info we may think of.

The usecase you are listing are great and make sense but they require a lot more work ...

I have designed them to acheve the oposite, so maybe I was not clear enough. I think we only need to change <20 lines of code to partially support it.

  • Remove the -1 during in soundsource stem to allow to use the premasterd streamIdx = 0
  • Reject not possible bit combinations (I would be not too limiting to not reduce creativity)

The idea with passing and adjusting the mute state was, that it can be implmented near BaseTrackPlayerImpl::loadTrack where also other CO are cloned by adding the mixxx::StemChannelSelection as a parameter or ideally use struct NewTrackRequest everywhere. But that is a clearly a future topic. The use case is, that we may want decide at one palce at the GUI / Mapping, how the stem deck schall be cloned and than do the muting only after plausibility checks and at the right moment after the old track has been removed and befor the new one is loaded. This is not yet possible by a maping only solution.

@daschuer
Copy link
Member

daschuer commented Oct 1, 2024

Just noticed we have alread a

void EngineDeck::cloneStemState(const EngineDeck* deckToClone) {
VERIFY_OR_DEBUG_ASSERT(deckToClone) {
return;
}
for (int stemIdx = 0; stemIdx < kMaxSupportedStems; stemIdx++) {
m_stemGain[stemIdx]->set(deckToClone->m_stemGain[stemIdx]->get());
m_stemMute[stemIdx]->set(deckToClone->m_stemMute[stemIdx]->get());
}
}
that needs to be adjusted for diffrent stem clone modes.

@acolombier
Copy link
Member Author

Sorry. but I am still not convinced.

I think not loading the pre-mastered if 0xf makes sense, but then I don't see how this is blocking the suggested simple CO, as IMHO "load_selected_track_stems" is clearly about "loading selected stem", not "loading a set of stem or maybe the first stream, or maybe all the stem as stem deck but muting it right away".

From my perspective, this is an overly complicated approach, which is planing on exposing technical details for a future that we haven't yet confirmation we will reach (supporting DSL requires a limiter and compressor that support all the spec settings, and from what I can see, this is a significant amount of work)

Appreciate we have now come in full circle, so I'm going to close this. I have open a PR on my fork for visibility.
Happy to revisit that in the future if someone can think of an alternative to the two clashing options.

@acolombier acolombier closed this Oct 1, 2024
@acolombier acolombier removed this from the 2.6-beta milestone Oct 1, 2024
@daschuer
Copy link
Member

daschuer commented Oct 1, 2024

Maybe the discussion has shift the focus away form the essence:

I consider this PR well done and ready to merge if you remove the "load_selected_track_stems" CO.
This is already a good benefit for the users and the CO may live in your privat repository until it is used in a mapping.

This gives the freedom to adjust is exactly to your mapping, without a breaking API change.

I am sorry if my posts read too demanding, this was not intended. I started this basically to collect all ideas and find the best solution for both of us. I am curious to se your mapping proposal.

@Eve00000
Copy link
Contributor

Eve00000 commented Oct 2, 2024

Is there a way to get a compromise? You both want the best and you both have good arguments.
Maybe we can make the "load_selected_track_stems" -part (very advanced stem loading) as an option in CMake / IFDEF. or as an option in preferences?

We can document that developers are in dubio about this part (and there is nothing wrong with it) and that these CO's are 'experimental'.
At the moment there are only few people using stems in Mixxx, when we release a Beta with the stem-capabilities we'll have plenty reactions from users with a lot of different wishes., maybe those reactions can clear oir mind and will show the right direction. (could be a complete other than thought).
I'm sure there will be users with high demands and expectations that want to experiment.

@daschuer
Copy link
Member

daschuer commented Oct 2, 2024

Thank you. An extra #ifdef will work for me.

@acolombier acolombier reopened this Nov 17, 2024
@acolombier acolombier force-pushed the feat/add-advanced-stem-load-cos branch from 5a19be4 to f439645 Compare November 25, 2024 20:59
@acolombier
Copy link
Member Author

The CI is failing due to the ongoing brownouts for MacOS 12 runners. I will kick the job again on Wednesday, once the brownout is over.

@daschuer could you please discard your change request?

Copy link
Member

@daschuer daschuer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.

@daschuer daschuer merged commit 437db42 into mixxxdj:main Nov 26, 2024
11 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants