Newer API #3

pnappa · 2018-11-17T10:54:21Z

Our current API is a bit clunky and doesn't allow supporting all of the proposed features we wanted.

So, here's some newer endpoints I propose:

Executing a basic subprocess

/**
 * Execute a subprocess and optionally call a function per line of stdout.
 * @param commandPath   - the path of the executable to execute, e.g. "/bin/cat"
 * @param commandArgs   - the extra arguments for an executable e.g. {"argument 1", "henlo"}
 * @param stdinInput    - a list of inputs that will be piped into the processes' stdin
 * @param lambda        - a function that is called with every line from the executed process (default NOP function)
 * @param env           - a list of environment variables that the process will execute with (default nothing)
 */
int execute(const std::string& commandPath, const std::vector<std::string>& commandArgs, const std::vector<std::string>& stdinInput, const std::function<void(std::string)>& lambda = [](std::string){}, const std::vector<std::string>& env = {});

Also, another execute function will exist, except that instead of taking in a std::vector to supply its stdin, it takes two input iterators (the start and end of a range). It's signature would look like this:

template<class InputIt>
int execute(const std::string& commandPath, const std::vector<std::string>& commandArgs, InputIt stdioBegin, InputIt stdioEnd, const std::function<void(std::string)>& lambda = [](std::string){}, const std::vector<std::string>& env)

Where the InputIt would require typical iterator methods implemented (similar to InputIterator defined here: https://en.cppreference.com/w/cpp/named_req/InputIterator).

Retrieving subprocess output

Similar to the execute function's parameters, the check_output function (name TBD, this name is just copied from the python3 subprocess library, we should check if they have better names (or there's internal conflict about how shit that name is)) simply returns a vector containing all lines of stdout.

std::vector<std::string> check_output(const std::string& commandPath, const std::vector<std::string>& commandArgs, const std::vector<std::string>& stdioInput, const std::vector<std::string>& env= {})
template<class InputIt>
std::vector<std::string> check_output(const std::string& commandPath, const std::vector<std::string>& commandArgs, InputIt stdioBegin, InputIt stdioEnd, const std::vector<std::string>& env = {});

Input iterator is similar to the above restrictions in the execute function.

Interactive input/output

To enable dynamically setting what should be fed into the program based on the output, we introduce a newer ProcessStream class. This class has only a few methods available, but allows feeding a string into the process, and checking whether a line is available to read (there may not be one always available for each stdin, consider grep when the line doesn't match) and also extracting it.

I imagine it's use will be something along these lines:

// ctor
ProcessStream(const std::string& commandPath, const std::vector<std::string>& commandArgs, const std::vector<std::string>& env);

ProcessStream ps("/bin/cat", {});

ps << "henlo world\n";

if (ps.ready()) {
    std::string output;
    output << ps;
    std::cout << output;
}

There will naturally be non-operator overload equivalents to those endpoints (most likely something like .write(std::string) and std::string .read(timeout).

We require a timeout, because it's not really possible to check if there is a line available, without some kind of call to select (I think!).

Daemon spawning

People seem to like to spawn and forget processes (especially in some kind of management application), so providing a method to disown and mark them as daemons makes a whole lotta sense. I haven't written daemon code too often (it's been a couple years since I have I reckon), and that was in pure C. As this library has to use C to implement these features it's certainly doable, but would require different behaviour per OS, because this I imagine heavily differs between OSes. The problem is, I don't really know what a nice interface to doing so would be.

Any suggestions would be much appreciated.

Error Handling

As we're dealing with external applications, they may throw at any time and inconsistently (either due to the application, the OS/user killing them, or bugs), we need some kind of error handling.

Currently, the above APIs proposed return an error status, but I feel like this may not be very idiomatic, and that exceptions should be used instead. I suppose this requires a read up on C++ philosophy before being able to make a decision, but I'm thinking of mirroring how Python's subprocess library does it, which is pretty much to throw an exception whenever anything goes wrong, including non-zero return statuses.

Asynchronous Calls

Whether the user should be responsible for running this stuff in the background is again another philosophical question to ask, but as this is a library aimed at dumb-dumbs, its worth consideration if this is something we should hold their hand on. Currently, we provide a method to spawn a subprocess asynchronously, and return a std::future of the return code. Naturally, if we change the error handling to be a different strategy, this may not make sense, but I can't imagine asynchronous informing the main process about the status via exceptions can't be neat, so using futures on the return value seems like a good idea to cover this.

Final remarks

In my mind, these cover the current TODOs I propose in the current README, so it seems like a good direction to head. The only thing that I don't know and could cause problems in these interfaces is how Windows deals with processes. Some of these things may not be possible.

Any comments are appreciated (and likely only Cameron is gonna comment, but I'll try and hook some people to comment).

Cheers

The text was updated successfully, but these errors were encountered:

pnappa · 2018-11-17T13:45:39Z

Thinking some more, another thing that I've wanted in the past is the ability to pipe in a C++ program. Using system() is dangerous if the string is able to be tainted, whilst with exec/popen etc, it's fine as the command arguments are strictly delineated. So, it's a bit annoying to invoke programs that use one another's output as their input.

I can't really imagine what it would look like either, but perhaps something like: auto ps = PipedProcess("/bin/cat", {}).pipe("/bin/cat", {}).pipe("/bin/grep", {"-i", "ok"});

So that they're chained perhaps...?

CameronHosking · 2018-11-17T14:26:30Z

With regards to the daemon we can probably just call daemon. The interface would probably be similar to execute but without the lambda since you would be deliberately disconnecting from it. This will be much more "interesting" to do with windows.

CameronHosking · 2018-11-18T02:14:10Z

Regarding error handling, I'm having a read of this paper by herb in which he suggests that both forms are inadequate. It hasn't help me pick one.

pnappa · 2018-11-18T13:49:23Z

I'm thinking that we should stick to exceptions, with some of the enlightenment from that doc. Specifically because we're most likely going to have some operator overloads (>> and << at least) and that the kind of things that these interact with may be error prone - consider writing strings to a closed process (mid-way through writing or otherwise), what do you do? Well, it's not really feasible to return values from this stuff, so exceptions are the only real good option in this case.

Yep, that daemon call seems good - another interesting thing is what would happen to the output? I can't remember what happens for daemons, so I'm not sure, but should we mimic something like nohup and have an explicit file set up for the daemon to output to?

CameronHosking · 2018-11-19T02:54:07Z

With regards to the new API I think it is strange that we require a list of strings for the arguments and a vector for the inputs. Is there a particular reason we can't just accept any iterable container of strings, or go full STL and only accept begin and end iterators for both?

pnappa · 2018-11-19T04:03:36Z

I suppose it's feasible - I've never seen the stdlib actually support that idea though.

I imagine the declaration would look something like this then:

template<class Container, class Container2>
int execute(const std::string& processPath, const Container& processArguments, const Container2& processStdin, ...);

Where Container must be at least forward iterable (in C++ land, this just have .begin() and .end(), and the iterator has to be an InputIterator as described before). I used different containers for the args and the input, because the user may well different containers for both.

Looking at something from the stdlib that I've been annoyed that doesn't have this signature would be std::find - https://en.cppreference.com/w/cpp/algorithm/find . They always take in iterators, but they just as well could use the container templating idea as mentioned above.

I agree that it's a good idea!

pnappa · 2018-11-19T05:31:32Z

How about we include all combinations? Eight is quite a lot (env also makes sense, right?)

int execute(path, const Container& args, const Container2& stdin, const Container& env);
int execute(path, const Container& args, const Container2& stdin, Iterator beginEnv, Iterator endEnv);
int execute(path, const Container& args, Iterator beginStdin, Iterator endStdin, const Container& env);
int execute(path, const Container& args, Iterator beginStdin, Iterator endStdin, Iterator beginEnv, Iterator endEnv);
int execute(path, Iterator argsBegin, Iterator argsEnd, const Container2& stdin, const Container& env);
int execute(path, Iterator argsBegin, Iterator argsEnd, const Container2& stdin,  Iterator beginEnv, Iterator endEnv);
int execute(path, Iterator argsBegin, Iterator argsEnd, Iterator beginStdin, Iterator endStdin, const Container& env);
int execute(path, Iterator argsBegin, Iterator argsEnd, Iterator beginStdin, Iterator endStdin, Iterator beginEnv, Iterator endEnv);

CameronHosking · 2018-11-19T05:50:00Z

all combinations? ick

pnappa · 2018-11-19T05:56:22Z

Just two?

int execute(path, const Container& args, const Container2& stdin, const Container& env);
int execute(path, Iterator argsBegin, Iterator argsEnd, Iterator beginStdin, Iterator endStdin, Iterator beginEnv, Iterator endEnv);

No builder pattern here, because the users probably won't like it

CameronHosking · 2018-11-19T06:21:04Z

I think just two but I'll just be using the iterator version for the internal Process class

pnappa · 2018-11-19T06:38:40Z

Alright, I'll update my branch when I get home (i'm working on the new API on the new_api branch on this repo)

pnappa · 2018-11-19T06:41:40Z

But you feel free to deviate, we'll be able to merge when you come back in

CameronHosking · 2018-11-19T23:17:14Z

The default behavior for iostreams is not to throw exceptions, you have to explicitly set what failure modes should throw exceptions https://en.cppreference.com/w/cpp/io/basic_ios/exceptions

pnappa · 2018-11-20T10:16:12Z

For the execute and check_output fns I ended up settling with this format:

Execute

/**
 * Execute a subprocess and optionally call a function per line of stdout.
 * @param commandPath   - the path of the executable to execute, e.g. "/bin/cat"
 * @param firstArg      - the begin iterator for a list of arguments
 * @param lastArg       - the end iterator for a list of arguments
 * @param stdinBegin    - an InputIterator to provide stdin
 * @param stdinEnd      - the end of the InputIterator range for stdin
 * @param lambda        - a function that is called with every line from the executed process (default NOP function)
 * @param envBegin      - the begin of an iterator containing process environment variables to set
 * @param envEnd        - the end of the env iterator
 */
template<class ArgIt, class StdinIt, class EnvIt, 
        typename = typename std::enable_if<is_iterator<ArgIt>::value, void>::type,
        typename = typename std::enable_if<is_iterator<StdinIt>::value, void>::type,
        typename = typename std::enable_if<is_iterator<EnvIt>::value, void>::type>
int execute(const std::string& commandPath, ArgIt firstArg, ArgIt lastArg, 
            StdinIt stdinBegin = DevNullIterator(), StdinIt stdinEnd = DevNullIterator(), 
            const std::function<void(std::string)>& lambda = [](std::string){}, 
            EnvIt envBegin = DevNullIterator(), EnvIt envEnd = DevNullIterator());
/**
 * Execute a subprocess and optionally call a function per line of stdout.
 * @param commandPath   - the path of the executable to execute, e.g. "/bin/cat"
 * @param commandArgs   - the extra arguments for an executable e.g. {"argument 1", "henlo"} (default no arguments)
 * @param stdinInput    - a list of inputs that will be piped into the processes' stdin (default no stdin)
 * @param lambda        - a function that is called with every line from the executed process (default NOP function)
 * @param env           - a list of environment variables that the process will execute with (default nothing)
 */
template<class ArgIterable = std::vector<std::string>, 
         class StdinIterable = std::vector<std::string>, 
         class EnvIterable = std::vector<std::string>,
         typename = typename std::enable_if<is_iterable<ArgIterable>::value, void>::type,
         typename = typename std::enable_if<is_iterable<StdinIterable>::value, void>::type,
         typename = typename std::enable_if<is_iterable<EnvIterable>::value, void>::type>
int execute(const std::string& commandPath, const ArgIterable& commandArgs = {}, 
            const StdinIterable& stdinInput = {}, const std::function<void(std::string)>& lambda = [](std::string){}, 
            const EnvIterable& env = {});

I made the arguments optional now, so people can write things like:

// don't even have to provide args (they're default empty)
execute("/cool/path/shellscript.sh");
// for the iterator version you must provide some (at least the arg iterator)
// this is because there's no way to differentiate between the two then. 
std::vector<std::string> args = {"argone", "argtwo", "arg3"};
execute("/cool/path/shellscript.sh", args.begin(), args.begin()+1);

check_output

Similar for check_output.

/**
 * Execute a subprocess and retrieve the output of the command
 * @param commandPath   - the path of the executable to execute, e.g. "/bin/cat"
 * @param firstArg      - the begin iterator for a list of arguments
 * @param lastArg       - the end iterator for a list of arguments
 * @param stdinBegin    - an InputIterator to provide stdin
 * @param stdinEnd      - the end of the InputIterator range for stdin
 * @param lambda        - a function that is called with every line from the executed process (default NOP function)
 * @param envBegin      - the begin of an iterator containing process environment variables to set
 * @param envEnd        - the end of the env iterator
 */
template<class ArgIt, class StdinIt, class EnvIt, 
        typename = typename std::enable_if<is_iterator<ArgIt>::value, void>::type,
        typename = typename std::enable_if<is_iterator<StdinIt>::value, void>::type,
        typename = typename std::enable_if<is_iterator<EnvIt>::value, void>::type>
std::vector<std::string> check_output(const std::string& commandPath, ArgIt firstArg, ArgIt lastArg,
                                      StdinIt stdinBegin = DevNullIterator(), StdinIt stdinEnd = DevNullIterator(), 
                                      EnvIt envBegin = DevNullIterator(), EnvIt envEnd = DevNullIterator());

/**
 * Execute a subprocess and retrieve the output of the command
 * @param commandPath   - the path of the executable to execute, e.g. "/bin/cat"
 * @param commandArgs   - the extra arguments for an executable e.g. {"argument 1", "henlo"}
 * @param stdinInput    - a list of inputs that will be piped into the processes' stdin
 * @param env           - a list of environment variables that the process will execute with (default nothing)
 */
template<class ArgIterable = std::vector<std::string>, 
         class StdinIterable = std::vector<std::string>, 
         class EnvIterable = std::vector<std::string>,
         typename = typename std::enable_if<is_iterable<ArgIterable>::value, void>::type,
         typename = typename std::enable_if<is_iterable<StdinIterable>::value, void>::type,
         typename = typename std::enable_if<is_iterable<EnvIterable>::value, void>::type>
std::vector<std::string> check_output(const std::string& commandPath, const ArgIterable& commandArgs = {}, 
                                      const StdinIterable& stdinInput = {}, const EnvIterable& env = {}) ;

Unfortunately, there's not really any room for the check_output return code. Any ideas? I don't want to return a tuple for the check_output (like std::pair<std::vectorstd::string, int> or w/e), because it increases the difficulty to use this library.

CameronHosking · 2018-11-20T11:00:15Z

What even would be the return codes? It just returns the std::out of the executed process. Are you talking about error codes?

pnappa · 2018-11-20T11:06:22Z

yeah, the process's return code (0 for success, other values for failure)

CameronHosking · 2018-11-20T11:09:13Z

So we're not doing exceptions? Or is this for non exceptional cases.

pnappa · 2018-11-20T11:13:35Z

yeah, that's what I feel like makes sense, especially for that as there's no way to even indicate error codes nicely.

exceptions it is? we may as well add these to the others then too.

CameronHosking · 2018-11-20T11:18:17Z

yeah can't really use error codes with a stateless call. regular streams store an error code internally.

pnappa · 2018-11-20T11:19:02Z

err, waitaminute, if we're throwing exceptions on non-zero statuses for execute, why are we returning an int for execute?

maybe it does make sense for only check_output to throw exceptions, each of the other interfaces can access the error states/codes.

CameronHosking · 2018-11-20T11:24:24Z

legacy, from when I thought we might do error codes

syncing branch

pnappa · 2018-11-21T14:48:40Z

I've merged your stuff in the new_api branch.

From our discussion we seemed to settle on stuff like this:

Process p("/bin/cat", {"filename.out"}, {}, [&](std::string s) { std::cout << s; });
Process p2("/bin/grep", {"-i", "cool"}, {}, [&](..){ ok; });
Process p2("/usr/bin/lolcat", {});
std::ifstream myfile("outti.out");

p.pipe_to(p3); 
p.pipe_to(p2).pipe_to(p3) >> myfile;  // or pipe_to(myfile)..?
// equivalent to p >> p2 >> p3 >> myfile;

p.start();


p << 1 << std::endl;
// this shouldn't be allowed
// std::string output;
// p >> output;

The current interface for the facing Process (still haven't renamed to ProcessStream) is:

class ProcessStream {
        // need some list of processes this process is supposed to pipe to
        // shouldn't need to store the lambda, as the internal::Process object will store that
        std::vector<ProcessStream&> nextProcesses;
    public:
        template<typename Functor = std::function<void(std::string)>>
            ProcessStream(const std::string& commandPath, const std::vector<std::string>& commandArgs, const Functor& func = [](std::string){});

        ~ProcessStream();

        // start the process and prevent any more pipes from being established.
        // may throw an exception?
        bool start();

        // write a line to the subprocess's stdin
        void write(const std::string& inputLine);
        // read a line and block until received (or until timeout reached)
        template<typename Rep>
            std::string read(std::chrono::duration<Rep> timeout=-1);
        // if there is a line for reading (optionally 
        template<typename Rep>
            bool ready(std::chrono::duration<Rep> timeout=0);

        // pipe some data to the receiver process, and return the receiver process
        // we do this so we can have: process1.pipe_to(process2).pipe_to(process3)...etc
        // if pipes are set up there are some restrictions on using the << and >> operators.
        // if a process is receiving from another process, then they cannot use operator<< anymore
        // if a process is outputting to another, they cannot use operator>> 
        ProcessStream& pipe_to(ProcessStream& receiver);
        // ditto
        ProcessStream& operator>>(ProcessStream& receiver);

        // read a line into this process (so it acts as another line of stdin)
        // instead of string, probably should be typename Stringable, and use stringstream and stuff.
        ProcessStream& operator<<(const std::string& inputLine);
        // retrieve a line of stdout from this process
        ProcessStream& operator>>(std::string& outputLine);

        // some other functions which maybe useful (perhaps take a timeout?)
        // returns whether it could terminate
        bool terminate();
        // a more...extreme way
        bool kill();
        // send arbitrary signals to the subprocess
        void signal(int signum);


        class iterator;

        // provide an iterator to iterate over the stdout produced
        iterator begin();
        iterator end();
};

We also had a todo list:


things
======
daemonize should close stdin and stdout and stderr (apparently this is what daemons expect)
    - cameron is saying that instead of closing (except stdin), we should redirect them to /dev/null
stderr and stdout shouldn't be merged (consider the dup2 call where they're both written to the same pipe)
    - provide a method for them to be able to execute and retrieve stderr?
    - python's subprocess seems to just by default (for check_output at least) only retrieve the stdout.
rename execute to run
we need to put a read (and call lambda) during the writing loop (the one that writes to the process stdin) so that the read pipe isn't clogged on the other end (consequentially blocking at the write pipe)
    - this will be easy when we write ProcessStream (which will be renamed Process)
    - make sure that we exhaust the .is_ready() and read() thing within the writing loop (what if the process outputs two lines for every input line..?)
We need to move the detail (the classes that are currently called Process and TwoWayPipe into a "private" namespace called detail


currently ignore async

cameron is fixing his stuff  up

setup travis for osx

potentially setup travis for windows

CameronHosking added a commit that referenced this issue Nov 21, 2018

Merge pull request #3 from pnappa/new_api

2bef28d

syncing branch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Newer API #3

Newer API #3

pnappa commented Nov 17, 2018 •

edited

Loading

pnappa commented Nov 17, 2018

CameronHosking commented Nov 17, 2018

CameronHosking commented Nov 18, 2018

pnappa commented Nov 18, 2018

CameronHosking commented Nov 19, 2018 •

edited

Loading

pnappa commented Nov 19, 2018 •

edited

Loading

pnappa commented Nov 19, 2018 •

edited

Loading

CameronHosking commented Nov 19, 2018

pnappa commented Nov 19, 2018 •

edited

Loading

CameronHosking commented Nov 19, 2018 •

edited

Loading

pnappa commented Nov 19, 2018

pnappa commented Nov 19, 2018

CameronHosking commented Nov 19, 2018

pnappa commented Nov 20, 2018

CameronHosking commented Nov 20, 2018

pnappa commented Nov 20, 2018

CameronHosking commented Nov 20, 2018

pnappa commented Nov 20, 2018

CameronHosking commented Nov 20, 2018

pnappa commented Nov 20, 2018 •

edited

Loading

CameronHosking commented Nov 20, 2018

pnappa commented Nov 21, 2018

Newer API #3

Newer API #3

Comments

pnappa commented Nov 17, 2018 • edited Loading

Executing a basic subprocess

Retrieving subprocess output

Interactive input/output

Daemon spawning

Error Handling

Asynchronous Calls

Final remarks

pnappa commented Nov 17, 2018

CameronHosking commented Nov 17, 2018

CameronHosking commented Nov 18, 2018

pnappa commented Nov 18, 2018

CameronHosking commented Nov 19, 2018 • edited Loading

pnappa commented Nov 19, 2018 • edited Loading

pnappa commented Nov 19, 2018 • edited Loading

CameronHosking commented Nov 19, 2018

pnappa commented Nov 19, 2018 • edited Loading

CameronHosking commented Nov 19, 2018 • edited Loading

pnappa commented Nov 19, 2018

pnappa commented Nov 19, 2018

CameronHosking commented Nov 19, 2018

pnappa commented Nov 20, 2018

Execute

check_output

CameronHosking commented Nov 20, 2018

pnappa commented Nov 20, 2018

CameronHosking commented Nov 20, 2018

pnappa commented Nov 20, 2018

CameronHosking commented Nov 20, 2018

pnappa commented Nov 20, 2018 • edited Loading

CameronHosking commented Nov 20, 2018

pnappa commented Nov 21, 2018

pnappa commented Nov 17, 2018 •

edited

Loading

CameronHosking commented Nov 19, 2018 •

edited

Loading

pnappa commented Nov 19, 2018 •

edited

Loading

pnappa commented Nov 19, 2018 •

edited

Loading

pnappa commented Nov 19, 2018 •

edited

Loading

CameronHosking commented Nov 19, 2018 •

edited

Loading

pnappa commented Nov 20, 2018 •

edited

Loading