Skip to content

Commit

Permalink
Merge #2061 from branch '1058-updateMonthlyRvkFromCulturegraph' of gi…
Browse files Browse the repository at this point in the history
…thub.com:hbz/lobid-resources
  • Loading branch information
dr0i committed Aug 27, 2024
2 parents b387263 + 2c80e61 commit ac779ce
Show file tree
Hide file tree
Showing 2 changed files with 23 additions and 1 deletion.
22 changes: 22 additions & 0 deletions scripts/generateRvkConcordance.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#/bin/bash
# Date: 2024-08
# Description: gets the monthly generated aggregated data from culturegraph
# Is called from crontab every second Wednesday of the month.
# Takes 5.5h, single process on quaoar.
# Generated tsv: ~ 257 MB
# See https://github.com/hbz/lobid-resources/issues/1058.

URL_ROOT="https://data.dnb.de/culturegraph/"
TARGET_FNAME="/data/other/cg/aggregate.marcxml.gz"

FNAME=$(curl $URL_ROOT | grep '<a href="aggregate_' | sed 's#.*\<a href="aggregate_\(.*\)".*#aggregate_\1#g')
echo "Got filename: $FNAME"
wget $URL_ROOT$FNAME -O $TARGET_FNAME

FNAME_SIZE=$(ls -s $TARGET_FNAME |cut -d ' ' -f1)
if [ $FNAME_SIZE -gt 8654321 ]; then # 9593288 blocks was aggregate_20240507.marcxml.gz
cd ..
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64/
mvn exec:java -Dexec.mainClass="org.lobid.resources.run.CulturegraphXmlFilterHbzRvkToTsv" -Dexec.args=$TARGET_FNAME
fi

Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
* @author Tobias Bülte (TobiasNx)
**/
public final class CulturegraphXmlFilterHbzRvkToTsv {
private static String OUTPUT_FILE="rvk.tsv";
private static String OUTPUT_FILE="lookup-tables/data/rvk.tsv";

public static void main(String... args) {
String XML_INPUT_FILE = new File(args[0]).getAbsolutePath();
Expand Down

0 comments on commit ac779ce

Please sign in to comment.