Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce size of the data files #11

Open
fingolfin opened this issue Feb 22, 2024 · 6 comments
Open

Reduce size of the data files #11

fingolfin opened this issue Feb 22, 2024 · 6 comments

Comments

@fingolfin
Copy link
Member

Currently there are 5 compressed data files (QUIMP[1-5].tar.bz2) in the repository which take up 9-69 MB each for a total of 190 MB. The user has to extract them for a total of 770 MB.

This should be reduced. Several ideas for this which can be combined.

First off, GAP can transparently access .gz files, this would suggest storing not e.g. lib/QUIMP_336.g but rather lib/QUIMP_336.g.gz in the archives, so that disk space usage is reduced for the end user. The result is "only" 270 MB

This would in fact allow shipping the files "directly" to the user, without a need for .tar.bz2 files. These could then also be removed from the repository which would be better anyway; we could instead keep the lib/QUIMP_*.g files in the repository directly (and compress them on the fly for releases, which we already do for multiple other packages)

Next, the content of the lib/QUIMP_*.g files could be optimized further.

@aniemeyer suggest that for many groups a good way to compress them is to store them via generators in a different, minimal degree representation; and then store generators of a subgroup such that the coset action on the subgroup gives the actual QUIMP permutations. Indeed, take for example QuimpGroup(4080,1). In the file lib/QUIMP_4080.g it takes up more than 0.5 MB space. But it is $A_{17}$ in disguise. So one could replace the generators by the information "this is A17" plus generators for the point stabilizer:

gap> G := QuimpGroup(4080,1);
<permutation group with 2 generators>
gap> IsAlternatingGroup(G);
true
gap> Size(G);
177843714048000
gap> Size(AlternatingGroup(17));
177843714048000
gap> iso:=IsomorphismGroups(A,G);;
gap> S:=PreImages(iso,Stabilizer(G,1));
Group([ (2,12,15,3,9), (1,16,8,6,9,3,2,11,17)(12,13,14), (1,16,17,13,11), (1,16,13,11)(2,4,14,17) ])
gap> SmallGeneratingSet(S);
[ (1,12,15,14,11,2,8,6,3,16,13,9)(4,17), (1,13,12)(2,3,11,16,6)(4,15,14)(8,17,9) ]
@fingolfin
Copy link
Member Author

To stay with the QuimpGroup(4080,1) example: in each entry, three groups are stored:

  • QUIMP_4080[1][1] is the group itself;
  • QUIMP_4080[1][3] it the socle
  • QUIMP_4080[1][4] is... perhaps the group T if the socle is T^k? But I didn't see any references to this in the code.

@DominikBernhardt is the format of the data files documented somewhere?

Anyway, in this specific example all three groups are the same. I think the socle should always be expressed in terms of the generators of the full group, perhaps via words in the generators. Doing so, I think this > 500kb entry could be shrunk by a factor 500.

It won't be as dramatic everywhere, but I am hopeful we can reduce by at least an order of magnitude.

@fingolfin
Copy link
Member Author

For the socle, we can in fact just store (information about) a normal generating set, to be fed into NormalClosure. If the socle is $T^k$ then often it will suffice to store generators for $T$.

@fingolfin
Copy link
Member Author

Also, for the name field, at least for the AS cases, it seems the content is just what IsomorphismTypeInfoFiniteSimpleGroup gives us. In that case I don't see a point in storing that, I'd just compute it on the fly.

@fingolfin
Copy link
Member Author

@glukemorgan I just learned from @aniemeyer that you already have a "small" / "reduced size" version of the data files. Is that correct? If so, perhaps you'd be willing to share it and then we could integrate it here and finally get this package released to a wider audience...

@glukemorgan
Copy link

Hi @fingolfin , sorry, I don't log in to github too often, I just saw this.
Yes I have a smaller version of the PA groups. How can I add it?

@fingolfin
Copy link
Member Author

Hi @glukemorgan and sorry I saw your message and then it got lost in the stack sigh.

You could add them via a pull request. Or you could email them to me and I can integrate them.

If you prefer we can also continue the conversation via email (reach me under mhorn (AT) rptu.de)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants