Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory is not being released in parallel execution #1101

Closed
zoilomora opened this issue Jul 21, 2023 · 8 comments · Fixed by #1147
Closed

Memory is not being released in parallel execution #1101

zoilomora opened this issue Jul 21, 2023 · 8 comments · Fixed by #1147

Comments

@zoilomora
Copy link

Q A
Version 2.0.0
Bug? yes
New feature? no
Question? yes
Documentation? no
Related tickets ~

When executing tasks with parallel enabled: true, the memory is not being released and it is exceeding the limit established in PHP.

My configuration

grumphp:
  process_timeout: 120
  ascii:
    failed:
      - config/hooks/ko.txt
    succeeded:
      - config/hooks/ok.txt
  parallel:
    enabled: true
    max_workers: 32
  tasks:
    composer:
      strict: true
    jsonlint: ~
    phpcpd:
      exclude:
        - 'var'
        - 'vendor'
        - 'tests'
      min_lines: 60
    phpcs:
      standard:
        - 'phpcs.xml.dist'
      whitelist_patterns:
        - '/^src\/(.*)/'
        - '/^tests\/(.*)/'
      encoding: 'UTF-8'
    phplint: ~
    phpstan_shell:
      metadata:
        label: phpstan
        task: shell
      scripts:
        - ["-c", "phpstan analyse -l 9 src"]
    phpunit: ~
    behat:
      config: ~
      format: progress
      stop_on_failure: true
    phpversion:
      project: '8.2'
    securitychecker_local:
      lockfile: ./composer.lock
      format: ~

Steps to reproduce:
At the end of the vendor/bin/grumphp file add the following to check memory usage:

$memory = memory_get_usage() / 1024 / 1024;
print_r(round($memory, 3) . ' MB' . PHP_EOL);
exit();

Run ./vendor/bin/grumphp run once with each of this options:

  • parallel: false
  • parallel: true

Result:

parallel: false
Used Memory: 32.553 MB

parallel: true
Used Memory: 215.642 MB

When the different tasks are finished, shouldn't the memory be released?

Is this the desired behavior?

@veewee
Copy link
Contributor

veewee commented Jul 21, 2023

Running grumphp in parallel mode opens up a separate process for every task you start. There is communication between those 2 processes and that's probably what is taking up the additional MBs of space.
I'm not sure if that memory needs to get manually freed.

However, grumphp is just a tool that finishes at some point.
At that moment, the memory gets freed nevertheless.
Therefore I am not sure if this really is an issue.

So what do you think about this? Is this really a problem or is the problem rather that you need to increase PHP's memory limit in order to get grumphp running on your project?

@zoilomora
Copy link
Author

I understand that if they are separate PHP processes, they should have the memory limit on each process.

I try to have the same memory limits in local as in production.

If the separate processes do not take up more than 32MB, it seems strange to me that all the tasks in the different processes take up more than 215MB.

The current limit is 256 MB and if I include more files the memory is exceeded. However, memory runs out when there is only 1 task left to finish.

I understand that the desired behavior would be: release that memory as tasks finish?

@ashokadewit
Copy link

I think the memory goes to the serialized task results. The task result contains the context, which contains the file collection. If not running in parallel this object is passed by reference, but when running in parallel it is serialized for each result. If the amount of files is large (5000 files in my case) and there are many tasks (20 in my case) GrumPHP will run out of memory. I solved it by registering a middleware to replace the file collections with an empty object. I'm not sure if this file collection is used in any way after a task has completed.

@veewee
Copy link
Contributor

veewee commented Jul 11, 2024

I solved it by registering a middleware to replace the file collections with an empty object.

Can you share your solution?

I'm not sure if this file collection is used in any way after a task has completed.

Currently not in this repository. However it's an official extension point, so one might be using that as a feature.

What I'm wondering is: Once the task has been executed in a separate worker, the serialized version is not being used anymore, meaning that it should be garbage collected at that point. So I assume the problem is that the context in the result is the serialized worker context instead of the initial process' context. So it might make sense to swap it back to the original reference, after which garbage collection kicks in?

@ashokadewit
Copy link

Can you share your solution?

Sure:

class UnsetFilesMiddleware implements TaskHandlerMiddlewareInterface
{
    /**
     * Unset files from task results.
     *
     * @param TaskInterface     $task
     * @param TaskRunnerContext $runnercontext
     * @param callable          $next
     *
     * @return Promise
     */
    public function handle(TaskInterface $task, TaskRunnerContext $runnercontext, callable $next): Promise
    {
        $result = $next($task, $runnercontext);
        if ($result instanceof Promise) {
            $result->onResolve(
                function ($exception, $value): void {
                    if ($value instanceof TaskResult) {
                        $property = new ReflectionProperty($value, 'context');
                        $property->setAccessible(true);
                        $property->setValue($value, new RunContext(new FilesCollection([])));
                    }
                }
            );
        }

        return $result;
    }
}

And then registered in grumphp.yml with:

  My\UnsetFilesMiddleware:
    tags:
      - name: grumphp.task_handler
        priority: 500

I'm still using version 1.5.1 of Grumphp, not sure if this also compatible with the newest version.

@veewee
Copy link
Contributor

veewee commented Jul 11, 2024

Can you verify swapping the "serialized" context coming back from the worker with the original context also does the trick?

                        $property = new ReflectionProperty($value, 'context');
                        $property->setAccessible(true);
-                       $property->setValue($value, new RunContext(new FilesCollection([])));
+                       $property->setValue($value, $runnerContext);

I'm still using version 1.5.1 of Grumphp, not sure if this also compatible with the newest version.

In 2.0 the async execution system changed but it is still using the context coming back from the worker. So I assume it will have similar issues.

@ashokadewit
Copy link

Can you verify swapping the "serialized" context coming back from the worker with the original context also does the trick?

It does :)

@veewee
Copy link
Contributor

veewee commented Aug 2, 2024

@ashokadewit Can you confirm the fix in #1147 would resolve the issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants