`modality` settings. #6

zgp123-wq · 2024-02-21T02:16:09Z

I'm encountering several challenges while training my model and would appreciate guidance on resolving them, particularly regarding the modality settings.

Error with Batch Size Setting at 12:
When setting the batch size to 12, I encountered a RuntimeError due to inconsistent tensor sizes during stacking. The error message states: "stack expects each tensor to be equal size, but got [1919, 3, 40, 40] at entry 0 and [3673, 3, 40, 40] at entry 1."
Out-of-Memory Error during Training with Batch Size 1:
Despite reducing the batch size to 1 during training and selecting modality as "video," "mfcc," "vggish," and "VA_continuous_label," I faced persistent out-of-memory errors. It seems that the memory consumption remains high. Could you suggest strategies to mitigate memory usage during training, considering the specified modalities?
Calculation Issue with Concordance Correlation Coefficient (CCC) during Validation:
During the validation process, I encountered difficulties in calculating the Concordance Correlation Coefficient (CCC). Despite employing standard procedures and specifying modality as "mfcc," "vggish," "bert," and "VA_continuous_label," the calculation consistently fails. Are there any specific considerations regarding the modality setting that might affect CCC calculation? I would appreciate any advice on ensuring accurate CCC calculation during validation.

The text was updated successfully, but these errors were encountered:

Jesayy · 2024-02-21T06:29:23Z

Hello, may I ask what error you encountered with the last question? When I ran it, I got an error like "No such file or directory: '/aff_wild2_abaw5/mean_std_info.pkl". Do you have this file?

sucv · 2024-02-22T06:51:32Z

Hi zgp123-wq,

Does it happen in batchsize8 or 2? If 8 or 2 works, then the only way to find out is to debug the code, set a breakpoint in getitem(self, index) for the dataset.py, and see which trial caused the error, then maybe go back to dataset_info.pkl, check the length recorded there, and check the actual mp4 length, etc.
I won't run the code in my own desktop. I run the code in computing servers, usually having hundreds of RAM and Video-RAM. If you choose to run in your local PC (having 32 GB RAM, it may lead to this error). However, in case it is caused by some problematic code, please still set break points in anywhere, probably dataset.py, to see what exactly caused the error and if it's code issue or truly insufficient RAM.
The error only occurs for the three modalities? It should have nothing to do with the modalities. Please set a break point in the line calculating the CCC, to make sure the preds and labels have correct shapes, e.g. Nx1 and Nx1.

In short, I cannot answer your questions directly, only way is to debug line by line. Sorry I wrote the code in such a poor manner.

Jesayy · 2024-03-04T08:37:05Z

Hello, I've encountered a similar error to the first problem you mentioned. Have you resolved it?

praveena2j · 2024-03-06T23:09:35Z

I think there is a bug in the code. torch data loader takes samples with equal shape but in the code, each trial is considered as a sample from getitem. SInce each trial has different shapes it throws this error

praveena2j · 2024-03-06T23:11:29Z

From the paper, I think the trials are divided into temporal sequences of 300 and each samples is considered as a sample of 300 samples, can you please confirm this ?

praveena2j · 2024-03-07T00:01:06Z

I think to fix the issue we need to change the windowing parameter to True in base/experiment.py line

self.data_arranger.generate_partitioned_trial_list(window_length=self.window_length,
hop_length=self.hop_length, fold=fold,
windowing=True)

sucv · 2024-03-07T02:42:33Z

From the paper, I think the trials are divided into temporal sequences of 300 and each samples is considered as a sample of 300 samples, can you please confirm this ?

confirmed.

sucv · 2024-03-07T02:51:37Z

I think to fix the issue we need to change the windowing parameter to True in base/experiment.py line

self.data_arranger.generate_partitioned_trial_list(window_length=self.window_length, hop_length=self.hop_length, fold=fold, windowing=True)

Thanks for your debugging and explanation. Indeed, setting windowing=True would use sliding window to sample each trial. Whereas setting windowing=False would load a complete trial. The latter is useful when you want to generate the output for the test trials with batch_size = 1.

But even when windowing=True, the output will still be restored to its original length (average if one time step has multiple output due to window overlapping) and the epoch CCC is then calculated over the restored output and VA labels.

praveena2j · 2024-03-07T15:28:30Z

Thank you for the confirmation

zgp123-wq · 2024-03-07T15:35:20Z

Thank you very much for your reply. However, I have noticed a strange phenomenon: in the training set evaluation, the CCC score is only slightly above 50, while in the validation set evaluation, it reaches 65. In theory, the CCC score in the training set evaluation should be very close to 100.

praveena2j · 2024-03-07T23:31:43Z

May I know where do you load data annotations in the script ?

sucv · 2024-03-08T00:37:16Z

Thank you very much for your reply. However, I have noticed a strange phenomenon: in the training set evaluation, the CCC score is only slightly above 50, while in the validation set evaluation, it reaches 65. In theory, the CCC score in the training set evaluation should be very close to 100.

Your "theory" has flaw then. Let me ask you,

Training CCC: 100
Validation CCC: 55

Training CCC: 55
Validation CCC: 54

Which model instance would you choose for submission?

Actually, you should say "strange" had the code reached CCC 100 in training.

praveena2j · 2024-03-08T02:32:42Z

can you share the "dataset_info.pkl" and "mean_std_info.pkl" ?

praveena2j · 2024-03-08T02:34:58Z

Also can you please let me know whether you trim the videos because "trim_video_fn" is not being called in preprocessing.py script as there is "trim_video" param is not set in config.py script

zgp123-wq · 2024-03-09T03:29:24Z

[datasetinfo_mean_sttdd.zip](https://github.com/sucv/ABAW3/files/14544763/datasetinfo_mean_sttdd.zip) I'm sorry, I just saw the message. I have provided the train_ccc and val_ccc, as well as the compressed file containing "dataset_info.pkl" and "mean_std_info.pkl" in the attachment. @sucv @praveena2j

sucv · 2024-03-09T04:45:45Z

Thx for your screenshot, now I understand. It seems to be under-fitting. But yeah, I also saw this in my training. As the epoch goes up, say, 10-20 epoch more, the trainning CCC should gradually increase and exceed the validation. Or, you may set “load_best_model_at_each_epoch=0” to not restrict the model state updating. I always set it to 1 and I ignored the issue you mentioned. Because by doing so I can have a few gain on validation CCC. Anyway, the root of this issue is unknown, probably related to the data/label distribution of this split.

…

On Sat, Mar 9, 2024 at 11:29 zgp123-wq ***@***.***> wrote: train_ccc_and_val_ccc.png (view on web) <https://github.com/sucv/ABAW3/assets/58291725/c93cc67f-1f23-4b77-9a11-0a4632680308> [datasetinfo_mean_sttdd.zip]( https://github.com/sucv/ABAW3/files/14544763/datasetinfo_mean_sttdd.zip) I'm sorry, I just saw the message. I have provided the train_ccc and val_ccc, as well as the compressed file containing "dataset_info.pkl" and "mean_std_info.pkl" in the attachment. @sucv <https://github.com/sucv> @praveena2j <https://github.com/praveena2j> — Reply to this email directly, view it on GitHub <#6 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AHBMSED2EQQXUKS5RM22K63YXJ62XAVCNFSM6AAAAABDSGZXPGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBWG4YTENRRG4> . You are receiving this because you were mentioned.Message ID: ***@***.***>

sucv · 2024-03-09T05:30:33Z

Also, according to your screenshot, the previous best epoch is 7 and then until epoch 21 the model state is updated. This is what I meant to happen, that is, update only if a higher Val ccc achieved, otherwise load the historical best model state. I don’t want the training CCC to increase so fast.

…

On Sat, Mar 9, 2024 at 11:29 zgp123-wq ***@***.***> wrote: train_ccc_and_val_ccc.png (view on web) <https://github.com/sucv/ABAW3/assets/58291725/c93cc67f-1f23-4b77-9a11-0a4632680308> [datasetinfo_mean_sttdd.zip]( https://github.com/sucv/ABAW3/files/14544763/datasetinfo_mean_sttdd.zip) I'm sorry, I just saw the message. I have provided the train_ccc and val_ccc, as well as the compressed file containing "dataset_info.pkl" and "mean_std_info.pkl" in the attachment. @sucv <https://github.com/sucv> @praveena2j <https://github.com/praveena2j> — Reply to this email directly, view it on GitHub <#6 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AHBMSED2EQQXUKS5RM22K63YXJ62XAVCNFSM6AAAAABDSGZXPGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBWG4YTENRRG4> . You are receiving this because you were mentioned.Message ID: ***@***.***>

praveena2j · 2024-03-09T15:14:50Z

Thanks for sharing the dataset_info.pkl files, in this file there is only 418 videos, I guess it is supposed to have 594 videos

sucv · 2024-03-10T01:14:14Z

Dear Gnana, you are supposed to generate them using the code. You already have everything needed to do so.

…

On Sat, Mar 9, 2024 at 23:15 R. Gnana Praveen ***@***.***> wrote: Thanks for sharing the dataset_info.pkl files, in this file there is only 418 videos, I guess it is supposed to have 594 videos — Reply to this email directly, view it on GitHub <#6 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AHBMSEEFMKKTN4GN7KZPV5DYXMRQBAVCNFSM6AAAAABDSGZXPGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBWHA4DEMZUHA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

praveena2j · 2024-03-12T23:08:55Z

Thanks for the update.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`modality` settings. #6

`modality` settings. #6

zgp123-wq commented Feb 21, 2024

Jesayy commented Feb 21, 2024

sucv commented Feb 22, 2024

Jesayy commented Mar 4, 2024

praveena2j commented Mar 6, 2024

praveena2j commented Mar 6, 2024

praveena2j commented Mar 7, 2024

sucv commented Mar 7, 2024

sucv commented Mar 7, 2024

praveena2j commented Mar 7, 2024

zgp123-wq commented Mar 7, 2024

praveena2j commented Mar 7, 2024

sucv commented Mar 8, 2024

praveena2j commented Mar 8, 2024

praveena2j commented Mar 8, 2024

zgp123-wq commented Mar 9, 2024

sucv commented Mar 9, 2024 via email

sucv commented Mar 9, 2024 via email

praveena2j commented Mar 9, 2024

sucv commented Mar 10, 2024 via email •

edited

Loading

praveena2j commented Mar 12, 2024

modality settings. #6

modality settings. #6

Comments

zgp123-wq commented Feb 21, 2024

Jesayy commented Feb 21, 2024

sucv commented Feb 22, 2024

Jesayy commented Mar 4, 2024

praveena2j commented Mar 6, 2024

praveena2j commented Mar 6, 2024

praveena2j commented Mar 7, 2024

sucv commented Mar 7, 2024

sucv commented Mar 7, 2024

praveena2j commented Mar 7, 2024

zgp123-wq commented Mar 7, 2024

praveena2j commented Mar 7, 2024

sucv commented Mar 8, 2024

praveena2j commented Mar 8, 2024

praveena2j commented Mar 8, 2024

zgp123-wq commented Mar 9, 2024

sucv commented Mar 9, 2024 via email

sucv commented Mar 9, 2024 via email

praveena2j commented Mar 9, 2024

sucv commented Mar 10, 2024 via email • edited Loading

praveena2j commented Mar 12, 2024

`modality` settings. #6

`modality` settings. #6

sucv commented Mar 10, 2024 via email •

edited

Loading