Skip to content

ModelUse

David Wood edited this page Aug 29, 2022 · 3 revisions

Classifying Using A Trained Model

The classify tool uses a trained model to provide classification results for one or more sounds. It can be run on sounds in the file system or on sounds provided to it over REST when running in server mode.

File System Mode

There are a number of output formats available including one line per classification, ranked results for each classification or a comparison of a classification against labels defined in a metadata-formatted CSV file. The following assumes a model named mylpknn.cfr saved in the local file system that was trained on 250 msec clips.

Single line classification

To do simple one line classifications of each 250 msec segment of 2 wav files,

% classify -file mylpknn.cfr -clipLen 250 ambient.wav voice.wav
Loading aisp properties from file:/c:/dev/aisp/aisp.properties

Sounds will be clipped every 250 msec into 250 msec clips (padding=NoPad)
C:\dev\sounds\video-tutorial\.\ambient.wav[0][0-250 msec]: class=click confidence=0.4049
C:\dev\sounds\video-tutorial\.\ambient.wav[1][250-500 msec]: class=ambient confidence=0.3886
C:\dev\sounds\video-tutorial\.\voice.wav[0][0-250 msec]: class=voice confidence=0.5207
C:\dev\sounds\video-tutorial\.\voice.wav[1][250-500 msec]: class=voice confidence=0.5197

or if you have a metadata-formatted CSV file,

% cat classify.csv
voice.wav,class=ambient,
ambient.wav,class=ambient,

% classify -file mylpknn.cfr -sounds classify.csv -clipLen 250
Loading aisp properties from file:/c:/dev/aisp/aisp.properties

Loading sounds from [classify.csv]
Sounds will be clipped every 250 msec into 250 msec clips (padding=NoPad)
C:\dev\sounds\video-tutorial/voice.wav[0][0-250 msec]: class=voice confidence=0.5207
C:\dev\sounds\video-tutorial/voice.wav[1][250-500 msec]: class=voice confidence=0.5197
C:\dev\sounds\video-tutorial/ambient.wav[0][0-250 msec]: class=click confidence=0.4049
C:\dev\sounds\video-tutorial/ambient.wav[1][250-500 msec]: class=ambient confidence=0.3886

Ranked classification results

To get more details on the classification rankings (not all models provide rankings),

% classify -file mylpknn.cfr -clipLen 250 ambient.wav voice.wav -ranked
Loading aisp properties from file:/c:/dev/aisp/aisp.properties

Sounds will be clipped every 250 msec into 250 msec clips (padding=NoPad)
C:\dev\sounds\video-tutorial\.\ambient.wav[0][0-250 msec]:
 class=click confidence=0.4049
 class=ambient confidence=0.3334
 class=voice confidence=0.2617
 class= confidence=0.0000
C:\dev\sounds\video-tutorial\.\ambient.wav[1][250-500 msec]:
 class=ambient confidence=0.3886
 class=click confidence=0.3513
 class=voice confidence=0.2600
 class= confidence=0.0000
C:\dev\sounds\video-tutorial\.\voice.wav[0][0-250 msec]:
 class=voice confidence=0.5207
 class=ambient confidence=0.2592
 class=click confidence=0.2201
 class= confidence=0.0000
C:\dev\sounds\video-tutorial\.\voice.wav[1][250-500 msec]:
 class=voice confidence=0.5197
 class=ambient confidence=0.2749
 class=click confidence=0.2053
 class= confidence=0.0000

Confusion matrix generation

If you have metadata-formatted CSV file of labeled sounds/segments, you can generate a confusion matrix as follows,

classify -file mylpknn.cfr -sounds metadata.csv -clipLen 250 -cm
Loading aisp properties from file:/c:/dev/aisp/aisp.properties


Loading sounds from [metadata.csv]
Sounds will be clipped every 250 msec into 250 msec clips (padding=NoPad)
COUNT MATRIX
Predicted  ->[   ambient   ][    click    ][    voice    ]
ambient    ->[ *    260 *  ][        0    ][        1    ]
click      ->[        0    ][ *      5 *  ][        0    ]
voice      ->[        0    ][        0    ][ *     76 *  ]

PERCENT MATRIX
Predicted  ->[   ambient   ][    click    ][    voice    ]
ambient    ->[  * 76.02 *  ][     0.00    ][     0.29    ]
click      ->[     0.00    ][  *  1.46 *  ][     0.00    ]
voice      ->[     0.00    ][     0.00    ][  * 22.22 *  ]

Label           |  Count |        F1 | Precision |    Recall
ambient         |    261 |    99.808 |   100.000 |    99.617
click           |      5 |   100.000 |   100.000 |   100.000
voice           |     76 |    99.346 |    98.701 |   100.000
Micro-averaged  |    342 |    99.708 |    99.708 |    99.708
Macro-averaged  |    342 |    99.718 |    99.567 |    99.872

Identifying potential mis-labeling

Sometimes the labeling of the data can actually have mistakes in it. The -compare option can help identify potential errors by comparing a data set (often the training data set) with the results produces by the model for that data set. For example,

% cat classify.csv
voice.wav,class=ambient,
ambient.wav,class=ambient,

% classify -file mylpknn.cfr -sounds classify.csv -clipLen 250 -compare
Loading aisp properties from file:/c:/dev/aisp/aisp.properties

Loading sounds from [classify.csv]
Sounds will be clipped every 250 msec into 250 msec clips (padding=NoPad)
C:\dev\sounds\video-tutorial/voice.wav[0][0-250 msec]: class=voice(!=ambient) confidence=0.5207
C:\dev\sounds\video-tutorial/voice.wav[1][250-500 msec]: class=voice(!=ambient) confidence=0.5197
C:\dev\sounds\video-tutorial/ambient.wav[0][0-250 msec]: class=click(!=ambient) confidence=0.4049
C:\dev\sounds\video-tutorial/ambient.wav[1][250-500 msec]: class=ambient(==ambient) confidence=0.3886

We can see that all segments of voice.wav and the first segment of ambient.wav may be mis-labeled. At this point, it may be worth going back and reviewing these segments to be sure they are labeled correctly.

Server Mode

The classify tool can operate in server mode in which an HTTP port is opened and to which REST requests may be made. To start the server,

% classify -file mylpknn.cfr -server
Loading aisp properties from file:/c:/dev/aisp/aisp.properties

[main] INFO org.eclipse.jetty.util.log - Logging initialized @394ms to org.eclipse.jetty.util.log.Slf4jLog
[main] INFO org.eclipse.jetty.server.Server - jetty-9.4.1.v20170120
[main] INFO org.eclipse.jetty.server.AbstractConnector - Started ServerConnector@65ce4c10{HTTP/1.1,[http/1.1]}{0.0.0.0:80}
[main] INFO org.eclipse.jetty.server.Server - Started @1091ms
Classify server started on port 80.
[/classifyWAV]=>org.eng.aisp.tools.Classify$ClassifyServlet-261bfac7
[/]=>org.eclipse.jetty.servlet.ServletHandler$Default404Servlet-b9d35593

The server is now ready to receive wav files (ideally 250 msec for our model) and return results all over REST. For example, to make a request to the server using curl,

% curl --data-binary "@voice.wav" -H "Content-Type:audio/wav" http://localhost/classifyWAV
{"class":{"labelName":"class","labelValue":"voice","confidence":0.5349207159359615,"rankedValues":[{"labelValue":"voice","confidence":0.5349207159359615},{"labelValue":"ambient","confidence":0.2573273018006295},{"labelValue":"click","confidence":0.207751982263409},{"labelValue":"","confidence":0.0}]}}

Please be sure to note that the classification is for the whole voice.wav file which is NOT 250 msec.
Not all models can classify clips of lengths that are different than the training length (e.g. neural net models dcase, neural-net, cnn). The knn models are able to however and so a single classification result is produced for the whole wav file. If/when a fixed clip length is needed, it is the responsibility of the caller to extract/provide the correct clip lengths to the server.