Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Participate in LaMachine v2? #21

Open
proycon opened this issue Mar 29, 2018 · 3 comments
Open

Participate in LaMachine v2? #21

proycon opened this issue Mar 29, 2018 · 3 comments

Comments

@proycon
Copy link

proycon commented Mar 29, 2018

I've been following this interesting project as there is some overlap with what I'm doing with LaMachine. I just released v2 today (https://proycon.github.io/LaMachine/) and am now starting to look at participation from other partners. As I said before in an e-mail already, I think including the Newsreader software fits perfectly in LaMachine and would be a valuable addition. Hence my invitation to you guys to participate!

When it comes to the dependencies you need, these are already included in LaMachine:

For svmlight and libsvm I don't foresee much problem in including them either.

You guys aim for a Docker container, which is of course a nice solution and one that LaMachine provides as well, but you also get flexibility and other options (VM, local installation, remote server, one command install, etc), the goal was to solve this on one level so not everybody has to do it themselves. LaMachine uses ansible (so you don't write a Dockerfile yourself) and the software is grouped into logical 'packages/roles' of which newsreader could be one (alpino for example is another); I have written contributor guidelines here: https://github.com/proycon/LaMachine/blob/master/CONTRIBUTING.md . Participation might spare some duplicate work and broaden the reach of both our efforts.

And to prevent any confusion; LaMachine is just a software distribution, it is not an NLP pipeline system (but may include pipelines in whatever shape or form), the goal is to make things easily installable on a wide variety of systems and forms, which is currently a major headache with Newsreader.

You may or may not want to replace your newsreader.sh with something more robust like Nextflow eventually, but that is a completely unrelated matter.

@wmkouw
Copy link
Contributor

wmkouw commented Mar 30, 2018

Hi Maarten,

Martine and me saw your email. We're still discussing things, sorry we haven't responded yet.

I agree that there is a lot of overlap and that it would be relatively straightforward to include NewsReader in LaMachine. However, we feel like we can't make this decision. NewsReader was developed by the CLTL. I feel like they should decide whether they want this to happen (which could be done at the upcoming meeting at the KNAW).

But I agree totally with your arguments; it's good to allow for multiple types of use and installation (Docker, VM, remote server, build script, etc.). At the moment, NewsReader is indeed missing an easy local install of the whole pipeline. It would be great if we could achieve this through LaMachine. And we would most certainly like to avoid duplicate work (that's also what we like to coordinate with this work package group from CLARIAH in the upcoming meeting).

For context: this current implementation is not a high-quality build. It's just so we can run NewsReader on the data of the EviDENce project. Of course, we're documenting everything and the Docker image build should be automatic, but it is not going to production-quality. We're hoping to convince our project partners to allow us to invest a bit more time in a more robust implementation.

Thanks for the tip on Nextflow, we suggested implementing the pipeline with CWL, similar to Janneke van der Zwaan's nlppln. Every module would correspond to a single Dockerfile and the CWL pipeline would string modules together, making sure that inputs and outputs match and keeping track of versions. That should make things a lot more robust and easy to use. Plus, we can reconfigure (and possible add more modules) pipelines with a small change in the workflow generator script.

I myself am not that familiar with Ansible, but at first glance, it looks like very similar to what we had in mind with the CWL pipeline. I'll have a closer look soon.

@proycon
Copy link
Author

proycon commented Apr 6, 2018

Hi Wouter, Martine, and also @antske (CLTL VU),

Thanks, no problem, I see where you're coming from, it's important to get things in a working state at least. Ideally, the upstream provider (CLTL in this case) should deliver software (or software compontenst) that are properly installable and usable, for LaMachine this would be a prerequisite just as well. The fact that it isn't (not an accusation, I of course realize this is a non-trivial matter especially with lots of interdependent software, moreso when manpower is limited) requires you to undertake this project in the first place. Ideally you wouldn't need to have to fork CLTL's projects (in this repo). Though it's none of my business, I'm a bit worried about divergence if upstream does decide to take up the projects again.

Your project will prove valuable in any case if we add Newsreader to LaMachine at some point, I'll know where to look :)

As for nextflow and CWL, I do believe they are working on CWL compatibility (https://www.nextflow.io/blog/2017/nextflow-and-cwl.html), but I have no experience with it myself. The idea behind CWL (independence of workflow implementation) is quite sympathetic, but I'm not sure if it's always worth it.

As for ansible, it's what LaMachine uses for installation and configuration, which is what ansible is designed for (you can compare it with things like puppet and cfengine if you happen to know those), but it's definitely not a technology for an NLP workflow/pipeline on par with CWL or Nextflow/Luigi/etc..

@antske
Copy link
Collaborator

antske commented Apr 6, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants