Skip to content

Commit

Permalink
Translate and adjust count example to fix #589
Browse files Browse the repository at this point in the history
I cant adjust the metrics example yet. At least I do not know what the outcome of the old morph is since it uses square.
  • Loading branch information
TobiasNx committed Jan 28, 2025
1 parent b02019c commit 875aa68
Show file tree
Hide file tree
Showing 12 changed files with 200 additions and 5 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ fileName|
open-file|
as-lines|
decode-pica|
morph(FLUX_DIR + "gnd-type.xml")|
fix(FLUX_DIR + "gnd-type.fix")|
stream-to-triples|
count-triples(countBy="object")|
template("${s}\t${o}")|
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
if any_match("[email protected]","...*")
replace_all("[email protected]","^(..).*","$1") #only keep the first two letters
retain("[email protected]") # only keep the relevent element
else
reject()
end
11 changes: 11 additions & 0 deletions metafacture-runner/src/main/dist/examples/count/subjects/10.pica

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
do list(path:"041A*","var":"$i")
copy_field("$i.9","relevantField.$append")
end

trim("relevantField.*")
uniq("relevantField")

retain("relevantField")
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@

default counts="myflux/counts.dat";
default counts=FLUX_DIR + "counts.dat";
default catalogue = FLUX_DIR + "10.pica";

//count references
Expand All @@ -10,10 +10,9 @@ open-file|
as-lines|
catch-object-exception|
decode-pica|
morph(FLUX_DIR + "references.xml")|
fix(FLUX_DIR + "references.fix")|
stream-to-triples|
count-triples(countBy="object")|

write("subjects.dat");
write(counts);


Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
default fileName = FLUX_DIR + "gnd-sample.pica";

fileName|
open-file|
as-lines|
decode-pica|
morph(FLUX_DIR + "gnd-type.xml")|
stream-to-triples|
count-triples(countBy="object")|
template("${s}\t${o}")|
write("stdout");

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@

catalogue|
open-file|
as-lines|
catch-object-exception|
decode-pica|
batch-log(batchsize="100000")|
morph(FLUX_DIR + "subject-cooccurrence.xml")|
stream-to-triples|
count-triples(countBy="object")|
calculate-metrics("X2")|
template("${s} ${o}")|
//write("stdout");
write(FLUX_DIR+"x2.dat");
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
<?xml version="1.0" encoding="UTF-8"?>
<metamorph xmlns="http://www.culturegraph.org/metamorph"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="1">
<rules>

<data source="041A*.9" name="@subj">
<trim />
<unique />
</data>

<square delimiter="&amp;" name="">
<data source="@subj" name=""/>
<postprocess>
<compose prefix="2:"/>
</postprocess>
</square>

<data source="@subj" name="">
<compose prefix="1:"/>
</data>

<data source="@subj" name="">
<occurrence only="1" />
<constant value="1:" />
</data>
</rules>
</metamorph>
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@

default counts=FLUX_DIR + "counts.dat";
default catalogue = FLUX_DIR + "10.pica";

//count references
"counting references in " + catalogue | write("stdout");

catalogue|
open-file|
as-lines|
catch-object-exception|
decode-pica|
morph(FLUX_DIR + "references.xml")|
stream-to-triples|
count-triples(countBy="object")|

write("subjects.dat");


0 comments on commit 875aa68

Please sign in to comment.