-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expert Creation Explanation #10
Comments
|
Thank you for your reply.
|
|
Thank you so much! Can this approach be applied to Llama-2 model as well? Can you suggest the changes that can be made to implement it? |
This approach can not be directly applied to Llama-2 because its activation function is SiLU instead of ReLU. In SiLU, the activation sparsity is not high enough. We are working on converting Llama-2 into a ReLU version. If we have any new progress, we will update it here. |
Okay, thank you so much! |
Hello,
When I am executing:
python moefication/param_cluster_example.py --model_path bert-sst2-bsz32/epoch_1.bin --res_path results/bert-sst2 --num-layer 24 --num-expert 128 --templates bert.encoder.layer.{}.intermediate.dense.weight
My output displays 24 counters with values like this:
Counter({4: 32, 20: 32, 27: 32, 42: 32, 116: 32, 67: 32, 85: 32, 48: 32, 101: 32, 13: 32, 79: 32, 118: 32, 63: 32, 127: 32, 80: 32, 90: 32, 82: 32, 34: 32, 113: 32, 21: 32, 64: 32, 59: 32, 105: 32, 15: 32, 102: 32, 121: 32, 25: 32, 23: 32, 95: 32, 17: 32, 19: 32, 103: 32, 26: 32, 99: 32, 72: 32, 55: 32, 97: 32, 7: 32, 107: 32, 122: 32, 96: 32, 125: 32, 62: 32, 11: 32, 18: 32, 65: 32, 52: 32, 98: 32, 9: 32, 38: 32, 76: 32, 124: 32, 91: 32, 84: 32, 126: 32, 8: 32, 60: 32, 0: 32, 2: 32, 104: 32, 74: 32, 24: 32, 70: 32, 44: 32, 10: 32, 30: 32, 106: 32, 35: 32, 58: 32, 47: 32, 39: 32, 29: 32, 36: 32, 111: 32, 68: 32, 61: 32, 56: 32, 46: 32, 114: 32, 1: 32, 78: 32, 32: 32, 53: 32, 83: 32, 109: 32, 37: 32, 117: 32, 89: 32, 49: 32, 28: 32, 112: 32, 77: 32, 40: 32, 123: 32, 3: 32, 43: 32, 93: 32, 92: 32, 120: 32, 69: 32, 31: 32, 57: 32, 41: 32, 16: 32, 110: 32, 119: 32, 66: 32, 50: 32, 87: 32, 86: 32, 54: 32, 115: 32, 108: 32, 73: 32, 5: 32, 33: 32, 88: 32, 22: 32, 94: 32, 71: 32, 14: 32, 12: 32, 75: 32, 51: 32, 45: 32, 6: 32, 100: 32, 81: 32})
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 24/24 [08:47<00:00, 21.97s/it]
What does this output represent? I guess that 24 is the count for the number of layers, there are 128 values (eg: 4:32,20:32,...) in the counter representing 128 experts. 32 is the number of neurons. Correct me if I am wrong and please let me know
Thank you
The text was updated successfully, but these errors were encountered: