Creat a fetchresourcelist plugin that queries Drupal for media to check #14

mjordan · 2018-10-28T19:10:04Z

Related to #6 and Islandora/documentation#945.

We should have a fetchresourcelist plugin that queries Drupal for resources to check. The code below is a working proof of concept. It requires that the Drupal JSON API contrib module is enabled.

<?php
// content type will need to be a Riprap admin option, as will limit (but
// apparently the JSON API's max is 50 items per page). In this example, 
// we requrest page 2 with a size of 3 nodes.
$page_url = "http://localhost:8000/jsonapi/node/islandora_object?page[offset]=2&page[limit]=3";

$page_output = file_get_contents($page_url);
$page_output = json_decode($page_output, true);

// Taxonomy terms to check will need to be a Riprap admin option.
$taxonomy_terms_to_check = array('/taxonomy/term/2'); // "Preservation master"

// At this point, we have a list of 3 nodes.
foreach ($page_output['data'] as $node) {
  $nid = $node['attributes']['nid']; 
  // Get the media associated with this node using the Islandora-supplied Manage Media View.
  $media_url = "http://admin:islandora@localhost:8000/node/" . $nid . "/media?_format=json";
  $media_data = file_get_contents($media_url);
  $media_data = json_decode($media_data);
  // Loop through all the media and pick the ones that
  // are tagged with terms in $taxonomy_terms_to_check.
  foreach ($media_data as $media) {
    if (count($media->field_tags)) {
      foreach ($media->field_tags as $term) {
        if (in_array($term->url, $taxonomy_terms_to_check)) {
          // @todo: Convert to the equivalent Fedora URL and add to the plugin's output.
          // @todo: Add option to not convert to Fedora URL if the site doesn't use Fedora.
          // In that case, we need to figure out how to get Drupal's checksum for the file over HTTP.
          var_dump($media->field_media_image[0]->url);
        }
      }
    }
  }
}

We will also need to persist the page number to request during the next scheduled job. This should probably go into a db table.

mjordan · 2018-10-29T17:10:06Z

Once Islandora-Devops/migrate_7x_claw#9 gets merged, the above code should look like:

<?php
// content type will need to be a Riprap admin option, as will limit (but
// apparently the JSON API's max is 50 items per page). In this example, 
// we requrest page 2 with a size of 3 nodes.
$page_url = "http://localhost:8000/jsonapi/node/islandora_object?page[offset]=2&page[limit]=3";

$page_output = file_get_contents($page_url);
$page_output = json_decode($page_output, true);

// Taxonomy terms to check will need to be a Riprap admin option.
// "Original File" and "Preservation Master File"
$taxonomy_terms_to_check = array('/taxonomy/term/15', '/taxonomy/term/16');

// At this point, we have a list of 3 nodes.
foreach ($page_output['data'] as $node) {
  $nid = $node['attributes']['nid']; 
  // Get the media associated with this node using the Islandora-supplied Manage Media View.
  $media_url = "http://admin:islandora@localhost:8000/node/" . $nid . "/media?_format=json";
  $media_data = file_get_contents($media_url);
  $media_data = json_decode($media_data);
  // Loop through all the media and pick the ones that
  // are tagged with terms in $taxonomy_terms_to_check.
  foreach ($media_data as $media) {
    if (count($media->field_media_use)) {
      foreach ($media->field_media_use as $term) {
        if (in_array($term->url, $taxonomy_terms_to_check)) {
          // @todo: Convert to the equivalent Fedora URL by querying Gemini
          // using the value of $media->field_media_image[0]->target_uuid to get this type of response:
          // {
          //  "drupal":"http:\/\/localhost:8000\/_flysystem\/fedora\/masters\/testing_12_OBJ.jpg",
          //  "fedora":"http:\/\/localhost:8080\/fcrepo\/rest\/masters\/testing_12_OBJ.jpg"
          // }
          // The Fedora URL is the one Riprap needs to validate the fixity of.
          // @todo: Add option to not convert to Fedora URL if the site doesn't use Fedora.
          // In that case, we need to figure out how to get Drupal's checksum for the file over HTTP.
        }
      }
    }
  }
}

mjordan · 2018-12-10T17:49:34Z

According to https://www.drupal.org/docs/8/modules/jsonapi/sorting, we can:

use a collection based on content type to limit nodes
sort. We probably want to sort by creation date desc, so newly added nodes get checked as soon as possible (if we sorted asc, the default, newly added nodes wouldn't get checked until all the end of a cycle). Can't think of a use case for sorting asc but maybe we can offer this as a Riprap config option if needed.

mjordan · 2018-12-10T17:53:30Z

We'll also need to include Basic auth credentials in Riprap for the JSON API and Views REST.

mjordan · 2018-12-11T14:13:01Z

Work in the issue-14 branch can now parse out the Drupal URLs of images attached to nodes:

php bin/console app:riprap:check_fixity
string(57) "http://localhost:8000/_flysystem/fedora/testing_8_OBJ.jpg"
string(57) "http://localhost:8000/_flysystem/fedora/testing_7_OBJ.jpg"
string(57) "http://localhost:8000/_flysystem/fedora/testing_6_OBJ.jpg"

This comes from each media entity's field_media_image field. We need to make sure that non-image files are also detected (i.e., what field do we use for non-image files?).

mjordan · 2018-12-11T14:16:10Z

Non-image files are in field_media_file.

mjordan · 2018-12-11T19:43:07Z

Only thing not working is the authenticating against Gemini using a JWT token.

mjordan · 2018-12-29T22:46:25Z

app:riprap:plugin:fetchresourcelist:from:drupal plugin is complete, but I'm getting some strange behavior. When riprap hits the last page of a JSON:API request, it throws a curl error:

In CurlFactory.php line 186:
                                                                                         
  cURL error 3: <url> malformed (see http://curl.haxx.se/libcurl/c/libcurl-errors.html)

However, the URL triggering this error works as expected (200 response code) when requested using curl on the command line, e.g., curl -v -uadmin:islandora "http://localhost:8000/jsonapi/node/islandora_object?page%5Blimit%5D=5&page%5Boffset%5D=10&sort=-changed".

mjordan · 2018-12-29T23:43:47Z

OK, have tracked this down to an empty $media_list on a node.

mjordan · 2018-12-30T19:02:48Z

Closed with 342ba23.

mjordan self-assigned this Oct 28, 2018

mjordan mentioned this issue Dec 10, 2018

Setup instructions based on claw-playbook #20

Closed

mjordan added a commit that referenced this issue Dec 10, 2018

Initial work on #14.

d8eb3d1

mjordan added a commit that referenced this issue Dec 10, 2018

Work on #14.

e2884f6

mjordan added a commit that referenced this issue Dec 11, 2018

Work on #14.

9832337

mjordan added a commit that referenced this issue Dec 11, 2018

Work on #14.

aef5804

mjordan added a commit that referenced this issue Dec 11, 2018

Work on #14.

faaa385

mjordan added a commit that referenced this issue Dec 18, 2018

Work on #14.

5a39c3c

mjordan added a commit that referenced this issue Dec 29, 2018

Work on #14.

cc2d0f6

mjordan added a commit that referenced this issue Dec 29, 2018

Work on #14.

5bdd669

mjordan added a commit that referenced this issue Dec 30, 2018

Work on #14.

8abb37a

mjordan added a commit that referenced this issue Dec 30, 2018

Work on #14.

cd01d8b

mjordan closed this as completed Dec 30, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Creat a fetchresourcelist plugin that queries Drupal for media to check #14

Creat a fetchresourcelist plugin that queries Drupal for media to check #14

mjordan commented Oct 28, 2018

mjordan commented Oct 29, 2018 •

edited

Loading

mjordan commented Dec 10, 2018 •

edited

Loading

mjordan commented Dec 10, 2018

mjordan commented Dec 11, 2018

mjordan commented Dec 11, 2018

mjordan commented Dec 11, 2018

mjordan commented Dec 29, 2018

mjordan commented Dec 29, 2018

mjordan commented Dec 30, 2018

Creat a fetchresourcelist plugin that queries Drupal for media to check #14

Creat a fetchresourcelist plugin that queries Drupal for media to check #14

Comments

mjordan commented Oct 28, 2018

mjordan commented Oct 29, 2018 • edited Loading

mjordan commented Dec 10, 2018 • edited Loading

mjordan commented Dec 10, 2018

mjordan commented Dec 11, 2018

mjordan commented Dec 11, 2018

mjordan commented Dec 11, 2018

mjordan commented Dec 29, 2018

mjordan commented Dec 29, 2018

mjordan commented Dec 30, 2018

mjordan commented Oct 29, 2018 •

edited

Loading

mjordan commented Dec 10, 2018 •

edited

Loading