Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update source checksums across the distro #909

Open
dmacks opened this issue Apr 13, 2022 · 18 comments
Open

Update source checksums across the distro #909

dmacks opened this issue Apr 13, 2022 · 18 comments

Comments

@dmacks
Copy link
Member

dmacks commented Apr 13, 2022

Fink originally used MD5 for sources. Then we added support for SHA1 and SHA256. Many (maybe even most) packages still use MD5, and a bunch SHA1. Should we work on upgrading the checksum fields to SHA256? The current SHA256 checksum feature requires any one of:

  • Apple openssl
  • Fink openssl
  • Fink coreutils
  • Fink md5deep
    so I think it's safe to assume it will work everywhere without hauling anything else into fink core or requiring adding a BuildDepends.
@nieder
Copy link
Member

nieder commented Apr 14, 2022

  1. If Source-Checksum and Source-MD5 are both present, only Source-Checksum is checked.
  2. Source-Checksum: MD5(xxx) is a valid construct, so it could be used as an easily scriptable replacement for Source-MD5 w/out having to recalculate checksums for tarballs.
  3. 4958 info files use MD5, 267 use SHA1, 960 use SHA256.

@nieder
Copy link
Member

nieder commented Apr 17, 2022

Doing this by hand is going to be nigh impossible and take forever.

@dmacks
Copy link
Member Author

dmacks commented Apr 17, 2022

Yup, item (2) is easily scriptable but item (1) is mindless but only partially scriptable at best. Assuming SHA256 is state-of-the-art, should we set a goal of at least always using it for new packages and switching to it whenever we upgrade to a new version?

@nieder
Copy link
Member

nieder commented Apr 18, 2022

Definitely should use SHA256 on any new packages and preferably on updates as well. Scripting to Source-Checksum: MD5() would help with that transition.

@nieder
Copy link
Member

nieder commented Apr 18, 2022

Also, this seems to work on most things that only have Source-MD5: (drops anything with variants)

#!/bin/bash
# SOURCE_LEVEL="" (empty) for the primary source
SOURCE_LEVEL="2"
for i in `ls -1 *.info`; do
	echo "transitioning $i"
	unset PKG_SOURCE PKG_FULL_SOURCE PKG_SOURCE_MD5 PKG_INFO_MD5
	if ! grep -q -Li "^Source${SOURCE_LEVEL}-Checksum:" $i; then
		# only look in files not using Source-Checksum (can have it and Source-MD5, so don't just search for Source-MD5)
		PKG_NAME=$(grep -i -m 1 ^Package: $i | cut -f 2 -d:)
		if [[ "$PKG_NAME" == *"type_"* ]]; then
			echo -e "File $i has varianted packages $PKG_NAME. Exiting...\n"
			continue
		fi
		if [[ $(grep -i -m 1 -c "^Type: bundle" $i) == 1 ]]; then
			echo -e "$PKG_NAME is a bundle with no source. Exiting...\n"
			continue
		fi
		if grep -i "^Distribution:" $i | grep -q -v "10.14.5"; then
			echo -e "$PKG_NAME Not available in this dist. Exiting...\n"
			continue
		fi
		if grep -q -li "^Source${SOURCE_LEVEL}Rename:" $i; then
			echo "Pkg uses Source${SOURCE_LEVEL}Rename"
			PKG_SOURCE=$(fink dumpinfo -fsource${SOURCE_LEVEL}rename $PKG_NAME | cut -f 2 -d' ')
		else
			PKG_FULL_SOURCE=$(fink dumpinfo -fsource${SOURCE_LEVEL} $PKG_NAME | cut -f 2 -d' ')
			if [[ -z $PKG_FULL_SOURCE ]]; then
				echo -e "$PKG_NAME does not have a source${SOURCE_LEVEL} file. Exiting...\n"
				continue
			elif [[ "$PKG_FULL_SOURCE" == "mirror:"* ]]; then
				echo "$PKG_NAME uses a mirror source"
				PKG_SOURCE=$(echo $PKG_FULL_SOURCE | rev | cut -f 1 -d: | rev | xargs basename)
			else
				echo "just trim the basename"
				PKG_SOURCE=$(basename $PKG_FULL_SOURCE)
			fi
		fi
		echo "PKG_FULL_SOURCE: $PKG_FULL_SOURCE"
		echo "PKG_SOURCE: $PKG_SOURCE"
		if [ ! -f /sw/src/$PKG_SOURCE ]; then
			fink -y fetch $PKG_NAME
		fi
		if [ -f /sw/src/$PKG_SOURCE ]; then
			PKG_SOURCE_MD5=$(md5sum /sw/src/$PKG_SOURCE | cut -f 1 -d' ')
		fi
		PKG_INFO_MD5=$(grep -i -m 1 ^Source${SOURCE_LEVEL}-MD5 $i | cut -f 2 -d:)
		echo "Do the MD5 sums agree?"
		echo "PKG_SOURCE_MD5: $PKG_SOURCE_MD5"
		echo "PKG_INFO_MD5: $PKG_INFO_MD5"
		if [ $PKG_INFO_MD5 = $PKG_SOURCE_MD5 ]; then
			PKG_SOURCE_SHA256=$(shasum -a 256 /sw/src/$PKG_SOURCE | cut -f 1 -d' ')
			echo "replacing Source${SOURCE_LEVEL}-MD5 with Source${SOURCE_LEVEL}-256"
			perl -pi -e "s|^[sS]ource${SOURCE_LEVEL}-[mM][dD]5: (.*)$|Source${SOURCE_LEVEL}-Checksum: SHA256($PKG_SOURCE_SHA256)|g" $i
		else
			echo "MD5 of tarball doesn't match .info. Will log this to ~/fink-md5-check.log"
			echo "$PKG_NAME : $PKG_SOURCE $PKG_SOURCE_MD5 $PKG_INFO_MD5 do not match" >> ~/fink-md5-check.log
		fi
	else
		echo "Package already uses Source-Checksum. Nothing to change"
	fi
	echo ""
done

echo "Done parsing folder."

There are probably cuter ways to do it via perl and directly calling the package manager for package and file names, but we don't need perfect. Just good enough to get the bulk of the packages.

@babayoshihiko
Copy link
Member

The pull request about CRAN packages have SHA256 for new versions. Can anyone check and approve it?

#902

@dmacks
Copy link
Member Author

dmacks commented Apr 24, 2022

The pull request about CRAN packages have SHA256 for new versions. Can anyone check and approve it?

#902

done

@babayoshihiko
Copy link
Member

I have an error:

curl --connect-timeout 30 -f -L -A 'fink/0.45.6' -O http://download.icu-project.org/files/icu4c/55.1/icu4c-55_1-src.tgz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   357  100   357    0     0   1089      0 --:--:-- --:--:-- --:--:--  1091
100   334  100   334    0     0    328      0  0:00:01  0:00:01 --:--:--   646
100   327  100   327    0     0    245      0  0:00:01  0:00:01 --:--:--   245
100  223k    0  223k    0     0   120k      0 --:--:--  0:00:01 --:--:--  508k
The SHA256 checksum of the file is incorrect. The most likely cause for this is a corrupted or incomplete download
Expected: e16b22cbefdd354bec114541f7849a12f8fc2015320ca5282ee4fd787571457b
Actual: MD5(8c96f044a55feae48ddc8da386c18e67)
        SHA1(11396ed355191b62ab9792e0ac279e40d5fda418)
        SHA256(6914440592a89837bb866a982b26b42f5d4114360c913e3d715e1b76b54f85b6)
Downloading the file "icu4c-55_1-src.tgz" failed.

@nieder
Copy link
Member

nieder commented May 12, 2022

I have an error:

curl --connect-timeout 30 -f -L -A 'fink/0.45.6' -O http://download.icu-project.org/files/icu4c/55.1/icu4c-55_1-src.tgz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   357  100   357    0     0   1089      0 --:--:-- --:--:-- --:--:--  1091
100   334  100   334    0     0    328      0  0:00:01  0:00:01 --:--:--   646
100   327  100   327    0     0    245      0  0:00:01  0:00:01 --:--:--   245
100  223k    0  223k    0     0   120k      0 --:--:--  0:00:01 --:--:--  508k
The SHA256 checksum of the file is incorrect. The most likely cause for this is a corrupted or incomplete download

Fixed. The download path changed and what you were getting is just (nor redirecting) HTML file. I've added upstream's mirror. The source still exists in our own master mirror, so can also get it from there.

@dhomeier
Copy link
Contributor

The asciidoc source I just tried to download has slightly changed from the one I had downloaded last year, with conflicting shasums of course:

> ls -l /opt/sw/src/asciidoc-8.6.10.tar.gz /opt/sw2/src/asciidoc-8.6.10.tar.gz                                                          
-rw-r--r--  1 root  wheel  577208 28 Feb  2022 /opt/sw/src/asciidoc-8.6.10.tar.gz
-rw-r--r--  1 root  wheel  577182 12 Jan 18:20 /opt/sw2/src/asciidoc-8.6.10.tar.gz
> shasum /opt/sw/src/asciidoc-8.6.10.tar.gz /opt/sw2/src/asciidoc-8.6.10.tar.gz                                                      
53b9c916bb4e29d2a4b850446be070ef81dcd792  /opt/sw/src/asciidoc-8.6.10.tar.gz
3df412406e37d1d4674ddc0b08a074d67c8d7fa0  /opt/sw2/src/asciidoc-8.6.10.tar.gz

on all mirrors I could reach and the original URL.

@nieder
Copy link
Member

nieder commented Jan 13, 2023

can you unpack them into separate directories and see what's the difference? Sometimes there's a silent update that's just a repackaging and that changes the checksum.

@dhomeier
Copy link
Contributor

dhomeier commented Jan 13, 2023

Uh, the source directory has changed to asciidoc-py-8.6.10. And the files are actually about a year newer, with such unconspicious changes as converting everything to python3!

--- asciidoc-8.6.10/asciidoc.py 2017-09-29 03:10:02
+++ asciidoc-py-8.6.10/asciidoc.py      2018-05-26 03:08:44
@@ -1,4 +1,4 @@
-#!/usr/bin/env python2
+#!/usr/bin/env python3
 """
 asciidoc - converts an AsciiDoc text file to HTML or DocBook
 
@@ -62,7 +62,7 @@
         d._keys = self._keys[:]
         return d
     def items(self):
-        return zip(self._keys, self.values())
+        return list(zip(self._keys, list(self.values())))
     def keys(self):
         return self._keys
     def popitem(self):

The Python files and the html*.conf, that's it – still great for not even changing the bugfix release number.

@nieder
Copy link
Member

nieder commented Jan 13, 2023

Looks like upstream moved from github/asciidoc to github/asciidoc-py, but that's lame to silently change older tags to python3.
Also, not the first people this has bitten: asciidoc-py/asciidoc-py#190

Probably should deal with fixing this in another issue/PR since it's no longer about just the SHA256.

@dhomeier
Copy link
Contributor

Indeed, brilliant!
On the plus side, the new upstream actually builds with macOS 12+ python3 – almost out of the box, except that they left the manpage a2x.py script of all things in python2 syntax!

@nieder
Copy link
Member

nieder commented Nov 4, 2023

devel/lazarus-doc.info @kamischi
bioconductor-globalancova-r.info

These are the only packages left in !base still using Source-MD5. Our %v for bioconductor-GlobalAncova has been pulled for newer releases, and lazarus-doc was silently updated upstream and can't be verified to be the same (#1058). Still a bunch of packages using SHA1.

@nieder
Copy link
Member

nieder commented Nov 5, 2023

These are the only packages left in !base that are using Source-Checksum: SHA1:

gdcm-2.4.5.info (removed upstream)
googlechart-py.info (zip of commit changed. no tags from archived project). change to PyPi or kill package?

@dmacks
Copy link
Member Author

dmacks commented Nov 5, 2023

These are the only packages left in !base that are using Source-Checksum: SHA1:

gdcm-2.4.5.info (removed upstream) googlechart-py.info (zip of commit changed. no tags from archived project). change to PyPi or kill package?

Thanks for doing the heavy lifting on this ticket! I support killing gdcm-2.4.5 altogether, as it's an older libversion and has a java dependency (unlike the newer libversion). There's only one dependant, I'll look af switching it up.

UPDATE: I killed the java dep in the older gcdm. The dependant, insighttoolkit45, FTBFS even on my 10.13 with a ton of C++ errors, so I can't work on scrapping old gdcm. Upstream has a newer insighttoolkit (maybe 413?) if someone wants to look at updating it. Mean time, I'll update the gdcm-2.4.5 to SHA256.

@dmacks
Copy link
Member Author

dmacks commented Nov 6, 2023

These are the only packages left in !base that are using Source-Checksum: SHA1:

googlechart-py.info (zip of commit changed. no tags from archived project). change to PyPi or kill package?

Kill. I don't see it in other distros; explicitly abandoned upstream in 2018.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants