-
Notifications
You must be signed in to change notification settings - Fork 3
ModelUse
The classify tool uses a trained model to provide classification results for one or more sounds. It can be run on sounds in the file system or on sounds provided to it over REST when running in server mode.
There are a number of output formats available including one line per classification, ranked results for each classification or a comparison of a classification against labels defined in a metadata-formatted CSV file. The following assumes a model named mylpknn.cfr saved in the local file system that was trained on 250 msec clips.
To do simple one line classifications of each 250 msec segment of 2 wav files,
% classify -file mylpknn.cfr -clipLen 250 ambient.wav voice.wav
Loading aisp properties from file:/c:/dev/aisp/aisp.properties
Sounds will be clipped every 250 msec into 250 msec clips (padding=NoPad)
C:\dev\sounds\video-tutorial\.\ambient.wav[0][0-250 msec]: class=click confidence=0.4049
C:\dev\sounds\video-tutorial\.\ambient.wav[1][250-500 msec]: class=ambient confidence=0.3886
C:\dev\sounds\video-tutorial\.\voice.wav[0][0-250 msec]: class=voice confidence=0.5207
C:\dev\sounds\video-tutorial\.\voice.wav[1][250-500 msec]: class=voice confidence=0.5197
or if you have a metadata-formatted CSV file,
% cat classify.csv
voice.wav,class=ambient,
ambient.wav,class=ambient,
% classify -file mylpknn.cfr -sounds classify.csv -clipLen 250
Loading aisp properties from file:/c:/dev/aisp/aisp.properties
Loading sounds from [classify.csv]
Sounds will be clipped every 250 msec into 250 msec clips (padding=NoPad)
C:\dev\sounds\video-tutorial/voice.wav[0][0-250 msec]: class=voice confidence=0.5207
C:\dev\sounds\video-tutorial/voice.wav[1][250-500 msec]: class=voice confidence=0.5197
C:\dev\sounds\video-tutorial/ambient.wav[0][0-250 msec]: class=click confidence=0.4049
C:\dev\sounds\video-tutorial/ambient.wav[1][250-500 msec]: class=ambient confidence=0.3886
To get more details on the classification rankings (not all models provide rankings),
% classify -file mylpknn.cfr -clipLen 250 ambient.wav voice.wav -ranked
Loading aisp properties from file:/c:/dev/aisp/aisp.properties
Sounds will be clipped every 250 msec into 250 msec clips (padding=NoPad)
C:\dev\sounds\video-tutorial\.\ambient.wav[0][0-250 msec]:
class=click confidence=0.4049
class=ambient confidence=0.3334
class=voice confidence=0.2617
class= confidence=0.0000
C:\dev\sounds\video-tutorial\.\ambient.wav[1][250-500 msec]:
class=ambient confidence=0.3886
class=click confidence=0.3513
class=voice confidence=0.2600
class= confidence=0.0000
C:\dev\sounds\video-tutorial\.\voice.wav[0][0-250 msec]:
class=voice confidence=0.5207
class=ambient confidence=0.2592
class=click confidence=0.2201
class= confidence=0.0000
C:\dev\sounds\video-tutorial\.\voice.wav[1][250-500 msec]:
class=voice confidence=0.5197
class=ambient confidence=0.2749
class=click confidence=0.2053
class= confidence=0.0000
If you have metadata-formatted CSV file of labeled sounds/segments, you can generate a confusion matrix as follows,
classify -file mylpknn.cfr -sounds metadata.csv -clipLen 250 -cm
Loading aisp properties from file:/c:/dev/aisp/aisp.properties
Loading sounds from [metadata.csv]
Sounds will be clipped every 250 msec into 250 msec clips (padding=NoPad)
COUNT MATRIX
Predicted ->[ ambient ][ click ][ voice ]
ambient ->[ * 260 * ][ 0 ][ 1 ]
click ->[ 0 ][ * 5 * ][ 0 ]
voice ->[ 0 ][ 0 ][ * 76 * ]
PERCENT MATRIX
Predicted ->[ ambient ][ click ][ voice ]
ambient ->[ * 76.02 * ][ 0.00 ][ 0.29 ]
click ->[ 0.00 ][ * 1.46 * ][ 0.00 ]
voice ->[ 0.00 ][ 0.00 ][ * 22.22 * ]
Label | Count | F1 | Precision | Recall
ambient | 261 | 99.808 | 100.000 | 99.617
click | 5 | 100.000 | 100.000 | 100.000
voice | 76 | 99.346 | 98.701 | 100.000
Micro-averaged | 342 | 99.708 | 99.708 | 99.708
Macro-averaged | 342 | 99.718 | 99.567 | 99.872
Sometimes the labeling of the data can actually have mistakes in it. The -compare option can help identify potential errors by comparing a data set (often the training data set) with the results produces by the model for that data set. For example,
% cat classify.csv
voice.wav,class=ambient,
ambient.wav,class=ambient,
% classify -file mylpknn.cfr -sounds classify.csv -clipLen 250 -compare
Loading aisp properties from file:/c:/dev/aisp/aisp.properties
Loading sounds from [classify.csv]
Sounds will be clipped every 250 msec into 250 msec clips (padding=NoPad)
C:\dev\sounds\video-tutorial/voice.wav[0][0-250 msec]: class=voice(!=ambient) confidence=0.5207
C:\dev\sounds\video-tutorial/voice.wav[1][250-500 msec]: class=voice(!=ambient) confidence=0.5197
C:\dev\sounds\video-tutorial/ambient.wav[0][0-250 msec]: class=click(!=ambient) confidence=0.4049
C:\dev\sounds\video-tutorial/ambient.wav[1][250-500 msec]: class=ambient(==ambient) confidence=0.3886
We can see that all segments of voice.wav and the first segment of ambient.wav may be mis-labeled. At this point, it may be worth going back and reviewing these segments to be sure they are labeled correctly.
The classify tool can operate in server mode in which an HTTP port is opened and to which REST requests may be made. To start the server,
% classify -file mylpknn.cfr -server
Loading aisp properties from file:/c:/dev/aisp/aisp.properties
[main] INFO org.eclipse.jetty.util.log - Logging initialized @394ms to org.eclipse.jetty.util.log.Slf4jLog
[main] INFO org.eclipse.jetty.server.Server - jetty-9.4.1.v20170120
[main] INFO org.eclipse.jetty.server.AbstractConnector - Started ServerConnector@65ce4c10{HTTP/1.1,[http/1.1]}{0.0.0.0:80}
[main] INFO org.eclipse.jetty.server.Server - Started @1091ms
Classify server started on port 80.
[/classifyWAV]=>org.eng.aisp.tools.Classify$ClassifyServlet-261bfac7
[/]=>org.eclipse.jetty.servlet.ServletHandler$Default404Servlet-b9d35593
The server is now ready to receive wav files (ideally 250 msec for our model) and return results all over REST. For example, to make a request to the server using curl,
% curl --data-binary "@voice.wav" -H "Content-Type:audio/wav" http://localhost/classifyWAV
{"class":{"labelName":"class","labelValue":"voice","confidence":0.5349207159359615,"rankedValues":[{"labelValue":"voice","confidence":0.5349207159359615},{"labelValue":"ambient","confidence":0.2573273018006295},{"labelValue":"click","confidence":0.207751982263409},{"labelValue":"","confidence":0.0}]}}
Please be sure to note that the classification is for the whole voice.wav file which is NOT 250 msec.
Not all models can classify clips of lengths that are different than the training length (e.g. neural net models dcase, neural-net, cnn).
The knn models are able to however and so a single classification result is produced for the whole wav file.
If/when a fixed clip length is needed, it is the responsibility of the caller to extract/provide the correct clip lengths to the server.