-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Participate in LaMachine v2? #21
Comments
Hi Maarten, Martine and me saw your email. We're still discussing things, sorry we haven't responded yet. I agree that there is a lot of overlap and that it would be relatively straightforward to include NewsReader in LaMachine. However, we feel like we can't make this decision. NewsReader was developed by the CLTL. I feel like they should decide whether they want this to happen (which could be done at the upcoming meeting at the KNAW). But I agree totally with your arguments; it's good to allow for multiple types of use and installation (Docker, VM, remote server, build script, etc.). At the moment, NewsReader is indeed missing an easy local install of the whole pipeline. It would be great if we could achieve this through LaMachine. And we would most certainly like to avoid duplicate work (that's also what we like to coordinate with this work package group from CLARIAH in the upcoming meeting). For context: this current implementation is not a high-quality build. It's just so we can run NewsReader on the data of the EviDENce project. Of course, we're documenting everything and the Docker image build should be automatic, but it is not going to production-quality. We're hoping to convince our project partners to allow us to invest a bit more time in a more robust implementation. Thanks for the tip on Nextflow, we suggested implementing the pipeline with CWL, similar to Janneke van der Zwaan's nlppln. Every module would correspond to a single Dockerfile and the CWL pipeline would string modules together, making sure that inputs and outputs match and keeping track of versions. That should make things a lot more robust and easy to use. Plus, we can reconfigure (and possible add more modules) pipelines with a small change in the workflow generator script. I myself am not that familiar with Ansible, but at first glance, it looks like very similar to what we had in mind with the CWL pipeline. I'll have a closer look soon. |
Hi Wouter, Martine, and also @antske (CLTL VU), Thanks, no problem, I see where you're coming from, it's important to get things in a working state at least. Ideally, the upstream provider (CLTL in this case) should deliver software (or software compontenst) that are properly installable and usable, for LaMachine this would be a prerequisite just as well. The fact that it isn't (not an accusation, I of course realize this is a non-trivial matter especially with lots of interdependent software, moreso when manpower is limited) requires you to undertake this project in the first place. Ideally you wouldn't need to have to fork CLTL's projects (in this repo). Though it's none of my business, I'm a bit worried about divergence if upstream does decide to take up the projects again. Your project will prove valuable in any case if we add Newsreader to LaMachine at some point, I'll know where to look :) As for nextflow and CWL, I do believe they are working on CWL compatibility (https://www.nextflow.io/blog/2017/nextflow-and-cwl.html), but I have no experience with it myself. The idea behind CWL (independence of workflow implementation) is quite sympathetic, but I'm not sure if it's always worth it. As for ansible, it's what LaMachine uses for installation and configuration, which is what ansible is designed for (you can compare it with things like puppet and cfengine if you happen to know those), but it's definitely not a technology for an NLP workflow/pipeline on par with CWL or Nextflow/Luigi/etc.. |
Hi all,
Thanks Maarten, for pointing that out! I think the issue tracker and
everything you noticed is really useful and we'll use them to improve our
modules and instructions. So overall, really happy with your work :-).
I agree with Maarten, though, that it is risky to fork these modules, since
they will most likely not stay in sync. There is thus a risk that people
will end up using old tools or try and combine incompatible tools.
I think it is great if tools are offered and presented in different ways,
but I think it is better to keep the repositories of the modules themselves
central.
If there are urgent fixes that need to be done for you to move ahead with
your project, please let us know and we'll try to make the necessary
changes in our repositories as soon as possible.
Best regards,
Antske
…On Fri, Apr 6, 2018 at 9:59 PM, Maarten van Gompel ***@***.*** > wrote:
Hi Wouter, Martine, and also @antske <https://github.com/antske> (CLTL
VU),
Thanks, no problem, I see where you're coming from, it's important to get
things in a working state at least. Ideally, the upstream provider (CLTL in
this case) should deliver software (or software compontenst) that are
properly installable and usable, for LaMachine this would be a prerequisite
just as well. The fact that it isn't (not an accusation, I of course
realize this is a non-trivial matter especially with lots of interdependent
software, moreso when manpower is limited) requires you to undertake this
project in the first place. Ideally you wouldn't need to have to fork
CLTL's projects (in this repo). Though it's none of my business, I'm a bit
worried about divergence if upstream does decide to take up the projects
again.
Your project will prove valuable in any case if we add Newsreader to
LaMachine at some point, I'll know where to look :)
As for nextflow and CWL, I do believe they are working on CWL
compatibility (https://www.nextflow.io/blog/2017/nextflow-and-cwl.html),
but I have no experience with it myself. The idea behind CWL (independence
of workflow implementation) is quite sympathetic, but I'm not sure if it's
always worth it.
As for ansible, it's what LaMachine uses for installation and
configuration, which is what ansible is designed for (you can compare it
with things like puppet and cfengine if you happen to know those), but it's
definitely not a technology for an NLP workflow/pipeline on par with CWL or
Nextflow/Luigi/etc..
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#21 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABkJbnjJ-5Mn5bd4yx--E5NBjVvYIigOks5tl8kQgaJpZM4TAWtH>
.
--
--
Computational Lexicology & Terminology Lab (CLTL)
Web & Media Group
The Network Institute, VU University Amsterdam
De Boelelaan 1105
1081 HV Amsterdam, The Netherlands
|
I've been following this interesting project as there is some overlap with what I'm doing with LaMachine. I just released v2 today (https://proycon.github.io/LaMachine/) and am now starting to look at participation from other partners. As I said before in an e-mail already, I think including the Newsreader software fits perfectly in LaMachine and would be a valuable addition. Hence my invitation to you guys to participate!
When it comes to the dependencies you need, these are already included in LaMachine:
For svmlight and libsvm I don't foresee much problem in including them either.
You guys aim for a Docker container, which is of course a nice solution and one that LaMachine provides as well, but you also get flexibility and other options (VM, local installation, remote server, one command install, etc), the goal was to solve this on one level so not everybody has to do it themselves. LaMachine uses ansible (so you don't write a Dockerfile yourself) and the software is grouped into logical 'packages/roles' of which newsreader could be one (alpino for example is another); I have written contributor guidelines here: https://github.com/proycon/LaMachine/blob/master/CONTRIBUTING.md . Participation might spare some duplicate work and broaden the reach of both our efforts.
And to prevent any confusion; LaMachine is just a software distribution, it is not an NLP pipeline system (but may include pipelines in whatever shape or form), the goal is to make things easily installable on a wide variety of systems and forms, which is currently a major headache with Newsreader.
You may or may not want to replace your
newsreader.sh
with something more robust like Nextflow eventually, but that is a completely unrelated matter.The text was updated successfully, but these errors were encountered: