-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: audio filter plugin #350
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, lmk when it's ready for final review
enable_filter: Any = None, | ||
filter_options: dict[str, Any] | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be more eager to add strict typings here.
I think we should remove the filter_options and put it inside the KrispFilter constructor.
I was imagining something like:
rtc.AudioStream(krisp=KrispFilter(options)) # options can be automatically determined by our third package that will find the dynamic lib inside our wheel
I'm not sure why we make it a generic interface today, feels easier to explicitly mention Krisp, wdyt?
EDIT: another option is to mention Enhanced Noise Suppression
or Background Voice Cancellation
instead of Krisp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I'm not familiar with Python, any suggestion like this is super welcome! Let me think about this
{ | ||
"url": url, | ||
"token": token, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make it strictly typed inside our protobufs definition instead of using json? The auth function could accept those as arguments (no need for a prost dependency)
livekit-rtc/livekit/rtc/room.py
Outdated
def __init__( | ||
self, | ||
loop: Optional[asyncio.AbstractEventLoop] = None, | ||
filters: Optional[List[Any]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the constructor is tricky because users don't construct it on agents, we do it for them. IMO we should only expose this API inside the rtc.AudioStream for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think so. But the Krisp filter needs to be initialized with the LIVEKIT_URL and TOKEN parameters to verify the token. So, I've initialized it here.
Instead, do you mean that it would be better to pass all of this information in the AudioStream constructor, as you suggested earlier? Or creating a separate init
function to pass these info? Something like:
krisp.init(url, token)
or something like that. I think the latter is better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no strong opinion, tho I know dz wanted to avoid the auth on the python side (so grabbing the url/token on the rust side)
@@ -54,6 +55,7 @@ def __init__( | |||
capacity: int = 0, | |||
sample_rate: int = 48000, | |||
num_channels: int = 1, | |||
audio_filter: Optional[Tuple[str, dict[str, any]]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any way to get something that looks more like #350 (comment)
This type isn't self-explanatory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I intended to cover this in the filter module side, because the option parameter will vary depending on the filter module. It looks something like this:
audio_filter=noise_filter.NC()
Please refer to the noise_filter repo for more details
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why we want to make this generic, if we have more filters in the future we could also add new keyword arguments, I propose to change the argument name to be directly krisp
I also doubt we're going to have more filters? (Noise cancellationEcho cancellation should be exposed directly to Python with raw API). Because we need to use it on external speakers not available on the Rust side (not tied to any rtc.AudioStream
― I wouldn't design for it inside this PR)
I think we should narrow down the API for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exposing the raw API is better in terms of architecture, and I agreed with this in our previous discussion. But it ended up with it's not good with our metric system, wasn't it? So we decided to extend AudioStream for now.
In that case, I would say that it would be better to handle the filter options in noise_filter side and to keep AudioStream generic. This would make updates to noise_filter easier. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I meant the echo cancellation in the previous message.
What I mean is why we want to keep the name generic like audio_filter instead of krisp
e.g krisp=KrispFilter(options)
or could also be noise_cancellation=krisp.AudioFilter(options)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see. I don't have a strong opinion on this. We're creating a noise filter package as livekit-noise-filter
(Initially, it was named livekit-krisp
, but we renamed it.) So, would it be the latter?
I introduced two parameters to the python-sdk.
The first one is for RoopOptions
: (This is required to register the filter to the room)
await room.connect(
os.getenv("LIVEKIT_URL"), token,
rtc.RoomOptions(
auto_subscribe=True,
audio_filters=[noise_filter],
)
)
The second is for AudioStream
:
stream = rtc.AudioStream.from_track(
track=track,
audio_filter=noise_filter.NC(),
)
So, should both options be named noise_cancellation_modules
and noise_cancellation
? How do they sound?
The
rust-sdk
implementation (livekit/rust-sdks#559) must be merged before this pull request.