-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Entropy increase suggestion - math part only #79
Comments
Hi, thanks for reaching out! It's definitively interesting! It sounds similar to a whitening transformation: If you want to share more details, we might try to develop a code to perform the transformation/filter - reading in a random data and performing the operation you describe. I have also checked your other project: Could you please point me to a script that performs the password generation as described in the README file? Thanks a lot |
Ok great here's the formula and I'll explain step by step all its tricks. M - mixed arrangements (number of arrangements that form based on elements repeating or not for a certain pattern). Bellow two examples, 1st when T≥S (typically for a pwd calculation) T=8, S=8 2+2+1+1+1+1=8 (took the beefiest bbp.) Note: there's no way even if the string is huge inevitable some patterns are weak (have very low number of arr.) that can be Always the sum of arrangements from all slices have to add up to T^S => Σ=T^S, this is good way to verify if the script or 2nd example when T<S (string longer than available unique elements, e.g useful for binary/decimal/hexadecimal The reason best pattern in the case of 40 bits is asymmetric is because once an 'm' shows up twice, the second factor As a general rule an 'm' should be as small possible (m=2) but not 0 (in that case there's no amplification factor at all but still Another general rule when T is significantly bigger than S the most beefy patterns are those that tend to have the least Bellow the script sum to S (see some comments inside script) and an example of calculation of a large string. I'll add a script `def NumberOfways(N, K):
Driver CodeN = 40 ## S string length (when T≥S both parmeters N an K should be the string length; when T<S, N=S and K=T) print(NumberOfways(N, K))` |
Hi, I'm sorry for the delay in my communication. I was busy with other stuff. I have reviewed your formula and in fact, this is a frequency check. This is already covered by the internal continuous/life tests - please see https://github.com/jirka-h/haveged/blob/master/README.md, section "TESTING haveged": Run-time tests are broken into 2 groups, Procedure A containing 5 tests, and Procedure B containing 3 tests. Procedure A consists of performing a 48-bit disjointedness test on 64K sequences, followed by 257 repetitions of the four FIPS-140-1 tests and an auto-correlation test on successive 2000 bit sequences. Procedure B performs distribution tests for 10,000 occurrences of 1, 2, 3, 4 bit runs in successive samples, followed by a entropy estimate based upon on another 256000+2560 bit sample. A sample must pass all individual tests to pass the procedure. To test it yourself, you might want to examine these random strings consisting of all printable characters, not including space ([:graph:]). Here is the sample command and it's output - feel free to modify the length of random strings as needed:
Thanks |
Hello, no problem Now my proposal is to do it like this: Calculate the required/desired entropy. For example when we need T=94 and S=16 => T^S=94^16 = 3,71×10³¹ (maximum possible). To avoid weak patterns that give weak strings (0.83% etc) were gonna increase the length of the string and 'target' the beefiest pattern. So we have T=94 but increase S=16+1=17 In this case best pattern (chassis or array however we wanna call it) is that with 21111111111111110 (similar but huge numerical different) So now were gonna have our 94^16=3,71×10³¹ desired entropy inside only one pattern 1,30×10³³ (just by increasing the length This way ALL (not only SOME) strings gonna have desired entropy 1,30×10³³. So we have to instruct our pwd generator to give ONE element at random that repeats, in random way, and 15 elements at random that are uniques (show up only once) => 21111111111111110 (the TWO in the pattern symbolizes that ONLY one element repeat) This concept of TARGETING RANDOMNESS ensures no accidents happens or better said eliminates completely accidents of a bad string. These patterns have to be predefined (pre-calculated) so the kernel doesn't waste time calculate things at that low level. For this I've developed a python pwd calculator script. I'll give an example to check it out how to use it. so for a pattern let's say 443222110 T=94, S=19 ( S =4+4+3+2+2+2+1+1=19) 'ems' are repetitions 4 3 2 If pattern has no 'ems' and hence no 'ens' e.g patterns like 1111111111.. consisting in unique elements only, just hit enter. Here's the python script to calc. best patterns. I'll provide you (if you're interested further) with a so call benchmark bash script that I made to check string patterns. But here's the pwd calc I've made. `print("enter your T and S") J = T**S print("your ems") print("insert your ens e.g. 4 3 2 1") Resize the arrays by filling with zerosarray1 = array1 + [0, 0, 0, 0] m7, m6, m5, m4 = array1[:4] ens = int(n7) + int(n6) + int(n5) + int(n4) import math a = math.factorial(T) Thanx for your interest. As a conclusion from my point of view, strings generated by your method may be good, but as I see it has no 'mechanism' in getting rid of 'accident string' nor other similar generators that consider whole cakes of possibilities. I'm 100% sure a targeted randomness and combined with your approach would give best results. Thanks again for your time and effort putted into this bold project of yours. Cheers! |
Hi, thanks a lot for the detailed walk through! It helped me a lot to fully understand your thought process and proposed method. I'm thinking about writing a script which would read random bytes from the stdin (for example from /dev/random) and create passwords of desired length from given alphabet using the concept of TARGETING RANDOMNESS. In essence, it would work similarly as naive approach with Now to make the script user friendly, you will be required to provide only password length and have methods to alter the alphabet. The script would then
I will share an inital version once it's ready. It might take a while because of summer holidays. Stay tuned. Thanks |
Great news! I'm confident this approach of aiming at the core of these arrangements will considerably increase security/strength of pwds and not only. Here's that benchmark script to obtain its pattern from a string. You just need to insert the string in the 'fin' file then execute x.sh. (not the best coding skills but it works). https://github.com/B1u3l1ght/Old-School-Pass/blob/main/pass_bench.zip Because an adversary doesn't know the chosen pwd length it can not know which pattern is used even if in the future will hear about this method. For the case where string length is known can be used t random top 3 or 4 strongest patterns to leave even the most determined attackers into a smoke curtain. Thanks and happy holidays 🌴🌴🌴 😉 |
Returned briefly for better solution obtaining bbp. from strings. These takes randomly generated strings from an 'a' file (not just one string at a time, column format) and gives their bbp in 'b' file and later can be assessed their strength via pwd calculator. perl same thing but in python And another important script this one that gives all possible bbp. upon provided S. When T happens to be smaller than S some of these bbp. do not count. bbp's have to be ways to sum to S starting from at least 1 factor till at the most T number of factors (and not S number of factors like in the T≥S case) https://github.com/B1u3l1ght/Old-School-Pass/blob/main/bbp.py Tnx and have a nice weekend. |
Thanks a lot for sharing your scripts! I have used the perl script to verify that my initial passwd generator works correctly, generating passwords according to the input backbone pattern. The next step that I need to implement is to
To generate all bbps, I will reuse the code from your script bbp.py. Could you share the code to compute the probability for given bbp, given the value of T and S? Thanks! |
I only have this mix.py script that expects user input for T and S, 'ems' and its 'ens'. Coding not my strongest point currently.. I guess it can be modified so it takes patterns generated by the bbp.py and feed the modified mix.py script. I've decided to express probability as percentage. The number of possible arrangements of the bbp divided by all possible arrangements T^S. So it's ((bbp arr. number)/T^S)*100 This part of the code (at the bottom) from mix.py calculates probability: However the more we increase string length the more bbp form (cake can be broken into more slices) and so the % goes even at 3-4% for the strongest but those weaker goes into the 0.04% or even 10^(-9)%. I think there's need to modify bbp.py too so it can output those bbp's that match the case when T<S. Currently it gives good results only for T≥S. For long string e.g 63 it can give huge number of patterns about 1.5 mil so it can crash the system. Tnx. |
Thanks for the pointers - I can take it from here.
I will try to come up with a initial version for T>S case in a week or so. Stay tuned. |
In the meantime fixed the problem. To boost python precision had to install a module called python-mpmath. Everything checks perfectly while mapping a whole 94^16 (16 chars long pwd with a search space depth 94, without space char). You can check it out here I'm gonna 'correct' python script mix.py and add mpmath syntax. There's need to kick precision as high as 1000. edit: Added the fixed version of the mix.py now called remix.py Used 'abs' (absolute syntax) as I've seen it gives higher accuracy this way. Cheers and tnx. |
Thanks for the updates! I'm leaving for summer vacation and I will have no computer with me. I will look into this in August when I'm back. Enjoy the summer (if you are based on The Northern Hemisphere) |
Ok great, I'll take couple of days off too going to some mountain region for cooler air cos here somewhere in the South-Eastern EU is boiling hot 🫠. Have a nice relaxing time! 🌴 🎾 🍹 |
Hello. How was your vacation or you still on it 😉. Made a larger simulation with your generator as is currently. Tried to simulate 1500 users generating their strings randomly. The result highlights are as follow:
Another observation I've made is that the moment we extract in bulk many random strings generator seems to perform better (strings have high entropy), when we extract fewer at once, generator seems to lose some entropy stamina. You can see results in the image bellow. The red line is the 'comfort zone', everything bellow that is 10x weaker. As a solution so we don't have to calculate every time specially for very long strings we can instruct random generator to sample 10-20 (or even 50, the higher the the better certainty) strings extract their patterns and the one that shows up more often than the others should be used as being the best. EDIT 1 : however for the large strings but not only for those still can happen some bbp. to show up as often as other even if are not that strong so to solve which one is better should still go thru formula that can finally establish the winning bbp. Cheers! |
Hi, thanks a lot for running the larger simulation! I have checked the frequencies for the first three categories and they match the expected values 595/1500: 39.6% (expected 39.25%) This verifies that haveged RNG runs correctly and also that the formulas and their implementation are accurate. Creating a Python password generator is still on my TODO list. While I'm back from the main vacation, I'm still offline most of the time, either traveling or being busy with other assignments. Realistically speaking, I will be able to look into it by mid-September when the school vacations are over and the kids are back at school. I'm sorry, but I need to balance my family life. Thanks for the understanding and your patience! |
That's ok there's no rush. Regarding RNG and the formula, yeah both of them work correctly and theory meets practice and vice versa. Once the new pwd generator will be done my bet is that will be the new reference point on pwd realm and not only. Cryptography in general can be significantly improved also. Have a nice weekend! |
Hello Jirka, Happy New Year! I'm having some novelties. For even a deeper dive into this complex task we started here. Example of a T pattern: 32 26 26 10 = 94 = T Surprisingly OEIS guys were significantly more pleased about this one but still there's a lot of slow advance as their foundation works too on donations business model. Since it got broader and we introduced a T pattern (grouping) named this kind of arr. Split Arrangements (S4) S index 4 means there are 4 groups in this case (32 26 26 10). S (pattern) 6 5 4 1 The above it's established as being the strongest S pattern 6 5 4 1 when we consider 32 26 26 10 grouping. So based on the grouping were gonna have different strongest S array. T grouping can not contain any 0 (zeroes) so it can not be 32 0 26 26 10 while the S array can have zeroes like 10 6 0 0 10+16 = 16 = S array Now the previous formula (mixed arr.) always took one way of grouping, per element only. And so T pattern was always like: 1 1 1 1 1..1 longer or shorter upon T number of elements but if we need to group elements based on their type the 111..1 T pattern not gonna be much helpful. Bellow I've attached an img. that shows the complete 'dissection' when T=6 and S=5, 6^5 = 7776 including the calc model. For the record one of the OEIS editors proposed this form for the 1st part of formula (dealing with T grouping) And here the Hyperstring thing. Each of those 6^5 = 7776 arr. belongs to 10 S pattern relative to a T pattern ( a bit ambiguous here). Why 10? In this case there are 11 ways to sum to T=6 but because one of the way means putting all of them in a large pile we don't need to count that as all of them have it. Those that are more numerous no matter what and can not be singled out in any way shape or form I proposed calling them Hyperstrings. Those highlighted in red rectangle are the weakest. Because I've used low numbers 240 doesn't look like much but if we scale things up in 94^16 or 94^63 Hyperstrings must be the holy grail of pwds. Currently I have no formula to calculate them. Here they are: Worked on a script with the help of Artix community that filled the black in my python skills .. This script you can edit T groups depending on liking, it counts and categorizes a text file full of strings. Can be used to establish the best S array for a certain T grouping counting which one shows more often. This is true till we hit big numbers aspect that I've previously talked for a bit. B1u3l1ght/Old-School-Pass@4c3bb23 Hope you'll gonna find this interesting 🤞🏻trying to make this as a Christmas/break time gift for the open source community. Peace, Blu3l1ght ✌🏻 |
Forgot to add a larger example when T=94, S=16 We can choose the other way around as long as we keep 6541 grouping however we can see the bigger numbers are behind the above (6 symbols, 5 letters, 4 low cap, 1 number). Upper right corner (green rectangle) we have confirmation about that grouping as being the strongest. Additional letters were used so to manage more easily those powers. You can see there are 4 groups so A(4,4)=24. So this 'base' 32 26 26 10 (the grouping) will show up 24 times but each time different till exhaust all those 24 possibilities that arise. Blu3l1ght✌️ |
Hello B1u3l1ght, Happy New Year to you, too! I really like this new approach. This addresses the common need to have a certain character classes in the password. I have created a script similar to your Now, here is the outline of the password generator:
This way, we can generate strong passwords and make sure that we have all categories as needed at the same time. I have a very good feeling about this. I have chunks of the code ready, now I have to integrate it. Let me know what you think. I will give it now a higher priority and will come up with first version of the code in a week or so. Jirka Script to count the frequencies of number of chars in each category.
Usage and sample output for 10,000 random passwords:
|
Looks great👍🏻. But it will take a bit more till I can analyze it properly again math layer only as I'm absolutely sure you're doing a great job not only code related but mathematically wise too.👍🏻 Had some more analogical work to do lately but glad I could sneak this newer approach and check your great work too. I'll keep an eye on it and ask you when I'm having any ambiguities but from the birds eye view it looks great. Hope we can create a solid standard here or at least a good start for it as there were lots of suppositions on this illusive topic and saw even high profile personalities that said many cloudy and unrealistic things on these matters. Have a great weekend! Blu3l1ght ✌🏻 |
Hello, I've heard about this amazing project and its goal to increase entropy for a stronger security. I've developed recently after a long road (3-4 years) of personal research a formula that might be used to obtain superior entropy, either we are talking about password generation or bits of entropy. This formula was submitted to OEIS and was approved twice, but without an explanation as we speak it remained in proposal status, from what I've understand the chosen name for the formula was not on editors liking.
To prove my point making this story short I'm gonna briefly show mixed arrangements (how I called them) 2^40, 40 bits based on element repeating (knowing processors work only with 1 and 0). The Idea is simple each string belongs to a pattern these patters can have more or less variants, the goal is to use that pattern that holds the bigger number of variants to obtain superior entropy. From the example bellow we can see the least numerous pattern is the first one that holds only 2 variants (string formed from only 1, and second the string is formed from 0). Based on this criteria patterns hold bigger or smaller number of possible arrangements. The exact calculation of these patterns can be made only thru my recently discovered formula. Now my proposal is that instead of randomly choose from all those possible patterns to use only the 'slice' with the bigger amount of arrangements and because this pattern is still lower than the whole to just increase the length of the string to match the previous entropy but using only that bigger slice.
So instead of 40 bits to have 43 but use from it only that slice with the biggest number of arrangements. For 44 bit string the beefiest slice would be when we have like 21 of 1 and 22 of 0 or vice versa 21 of 0 and 22 of 1. That pattern/slice being approx. 2x bigger (~2*10^12) than the whole 40 bits (~10^12) The reason to do this would be to rule out those extremely guessable patterns and their strings. So instead of having that good old random were gonna have a so called targeted random minimizing almost completely the eventuality of anyone guessing kernel memory allocations.
If I've managed to lit up some light of interest I can go further into details about formula and so on. Last thing that I wanna add is that I'm not a programmer (just some basic stuff some python here and there etc.) the only part I can touch is the math part of this topic.
M symbolizes mixed arrangements for different patterns.
The text was updated successfully, but these errors were encountered: