Dies ist mein Versuch, aus einem SVN-Repo ein deterministisches Archiv zu erstellen.
Benötigt GNU tar 1.27 oder höher.
Eigenschaften:
- Deterministischer Tarball
- Zeitstempelpräzision in Mikrosekunden von SVN dank
pax
erweitertem Header - Die in Archivkommentaren gespeicherte Revisions-ID ist genau wie
git archive
es tun würde - Ich demonstriere beides
.tar.gz
und .tar.xz
Kompressionen. Für gzip können Sie die Komprimierung mithilfe advdef
von AdvanceCOMP ( advdef
verwendet die zopfli-Bibliothek) weiter optimieren .
Als Beispiel verwende ich Subversion Repo selbst als Quelle für das Auschecken und Erstellen des Tarballs. Beachten Sie, dass dies nicht die Art und Weise ist, in der SVN ihre Tarballs paketiert, und ich habe keinerlei Bezug zur SVN-Entwicklung. Es ist schließlich nur ein Beispiel .
# URL of repository to export url="https://svn.apache.org/repos/asf/subversion/tags/1.9.7/" # Name of distribution sub-directory dist_name=subversion-1.9.7-test # --------------------------------------------------------------------- info=$(svn info --xml "$url" | tr -- '\t\n' ' ') revision=$(echo "$info" | sed 's|.*<commit[^>]* revision="\\([^">]*\)"\>.*|\1|') tar_name=$-r$ # Subversion's commit timestamps can be as precise as 0.000001 seconds, # but sub-second precision is only available through --xml output # format. date=$(echo "$info" | sed 's|.*<commit[^>]*>.*<date>\([^<]*\)</date>.*</commit>.*|\1|') # Factors that would make tarball non-deterministic include: # - umask # - Ordering of file names # - Timestamps of directories ("svn export" doesn't update them) # - User and group names and IDs # - Format of tar (gnu, ustar or pax) # - For pax format, the name and contents of extended header blocks umask u=rwx,go=rx svn export -r "$revision" "$url" "$dist_name" # "svn export" will update file modification time to latest time of # commit that modifies the file, but won't do so on directories. find . -type d | xargs touch -c -m -d "$date" -- trap 's=$?; rm -f "$.tar" || : ; exit $s' 1 2 3 15 # pax extended header allows sub-second precision on modification time. # The default extended header name pattern ("%d/PaxHeaders.%p/%f") # would contain a process ID that makes tarball non-deterministic. # "git archive" would store a commit ID in pax global header (named # "pax_global_header"). We can do similar. # GNU tar (<=1.30) has a bug that it rejects globexthdr.mtime that # contains fraction of seconds. pax_options=$(printf '%s%s%s%s%s%s' \ "globexthdr.name=pax_global_header," \ "globexthdr.mtime={$(echo $|sed -e 's/\.[0-9]*Z/Z/g')}," \ "comment=$," \ "exthdr.name=%d/PaxHeaders/%f," \ "delete=atime," \ "delete=ctime") find "$dist_name" \ \( -type d -exec printf '%s/\n' '{}' \; \) -o -print | LC_ALL=C sort | tar -c --no-recursion --format=pax --owner=root:0 --group=root:0 \ --pax-option="$pax_options" -f "$.tar" -T - # Compression (gzip/xz) can add additional non-deterministic factors. # xz format does not store file name or timestamp... trap 's=$?; rm -f "$.tar.xz" || : ; exit $s' 1 2 3 15 xz -9e -k "$.tar" # ...but for gzip, you need either --no-name option or feed the input # from stdin. This example uses former, and also tries advdef to # optimize compression if available. trap 's=$?; rm -f "$.tar.gz" || : ; exit $s' 1 2 3 15 gzip --no-name -9 -k "$.tar" && { advdef -4 -z "$.tar.gz" || : ; }