Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: audio filter plugin #350

Open
wants to merge 21 commits into
base: main
Choose a base branch
from
Open

Conversation

typester
Copy link
Member

The rust-sdk implementation (livekit/rust-sdks#559) must be merged before this pull request.

@bcherry bcherry self-requested a review January 28, 2025 07:19
Copy link
Contributor

@bcherry bcherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, lmk when it's ready for final review

livekit-rtc/livekit/rtc/_ffi_client.py Outdated Show resolved Hide resolved
@typester typester marked this pull request as ready for review February 6, 2025 21:17
@typester typester changed the title [WIP] feat: audio filter plugin feat: audio filter plugin Feb 6, 2025
Comment on lines 58 to 59
enable_filter: Any = None,
filter_options: dict[str, Any] | None = None,
Copy link
Member

@theomonnom theomonnom Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be more eager to add strict typings here.
I think we should remove the filter_options and put it inside the KrispFilter constructor.
I was imagining something like:

rtc.AudioStream(krisp=KrispFilter(options)) # options can be automatically determined by our third package that will find the dynamic lib inside our wheel

I'm not sure why we make it a generic interface today, feels easier to explicitly mention Krisp, wdyt?

EDIT: another option is to mention Enhanced Noise Suppression or Background Voice Cancellation instead of Krisp

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I'm not familiar with Python, any suggestion like this is super welcome! Let me think about this

Comment on lines 20 to 23
{
"url": url,
"token": token,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make it strictly typed inside our protobufs definition instead of using json? The auth function could accept those as arguments (no need for a prost dependency)

def __init__(
self,
loop: Optional[asyncio.AbstractEventLoop] = None,
filters: Optional[List[Any]] = None,
Copy link
Member

@theomonnom theomonnom Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the constructor is tricky because users don't construct it on agents, we do it for them. IMO we should only expose this API inside the rtc.AudioStream for now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think so. But the Krisp filter needs to be initialized with the LIVEKIT_URL and TOKEN parameters to verify the token. So, I've initialized it here.

Instead, do you mean that it would be better to pass all of this information in the AudioStream constructor, as you suggested earlier? Or creating a separate init function to pass these info? Something like:

krisp.init(url, token)

or something like that. I think the latter is better

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no strong opinion, tho I know dz wanted to avoid the auth on the python side (so grabbing the url/token on the rust side)

@typester typester requested a review from theomonnom February 12, 2025 20:04
@@ -54,6 +55,7 @@ def __init__(
capacity: int = 0,
sample_rate: int = 48000,
num_channels: int = 1,
audio_filter: Optional[Tuple[str, dict[str, any]]] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any way to get something that looks more like #350 (comment)
This type isn't self-explanatory

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intended to cover this in the filter module side, because the option parameter will vary depending on the filter module. It looks something like this:

audio_filter=noise_filter.NC()

Please refer to the noise_filter repo for more details

Copy link
Member

@theomonnom theomonnom Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why we want to make this generic, if we have more filters in the future we could also add new keyword arguments, I propose to change the argument name to be directly krisp

I also doubt we're going to have more filters? (Noise cancellationEcho cancellation should be exposed directly to Python with raw API). Because we need to use it on external speakers not available on the Rust side (not tied to any rtc.AudioStream ― I wouldn't design for it inside this PR)

I think we should narrow down the API for now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exposing the raw API is better in terms of architecture, and I agreed with this in our previous discussion. But it ended up with it's not good with our metric system, wasn't it? So we decided to extend AudioStream for now.

In that case, I would say that it would be better to handle the filter options in noise_filter side and to keep AudioStream generic. This would make updates to noise_filter easier. What do you think?

Copy link
Member

@theomonnom theomonnom Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I meant the echo cancellation in the previous message.

What I mean is why we want to keep the name generic like audio_filter instead of krisp
e.g krisp=KrispFilter(options)
or could also be noise_cancellation=krisp.AudioFilter(options)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see. I don't have a strong opinion on this. We're creating a noise filter package as livekit-noise-filter (Initially, it was named livekit-krisp, but we renamed it.) So, would it be the latter?

I introduced two parameters to the python-sdk.

The first one is for RoopOptions: (This is required to register the filter to the room)

await room.connect(
    os.getenv("LIVEKIT_URL"), token,
    rtc.RoomOptions(
        auto_subscribe=True,
        audio_filters=[noise_filter],
    )
)

The second is for AudioStream:

stream = rtc.AudioStream.from_track(
    track=track,
    audio_filter=noise_filter.NC(),
)

So, should both options be named noise_cancellation_modules and noise_cancellation? How do they sound?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants