Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: About pre-fork model #641

Open
Howar-sz opened this issue Jul 11, 2024 · 3 comments
Open

[Question]: About pre-fork model #641

Howar-sz opened this issue Jul 11, 2024 · 3 comments

Comments

@Howar-sz
Copy link

Hi, I'm a pyftpdlib user, and I'm looking to enhance the performance of my FTP server implementation. I came across the pre-fork model in the tutorial (https://github.com/giampaolo/pyftpdlib/blob/master/docs/tutorial.rst#pre-fork), but I'm having difficulty grasping how worker processes acquire connections. I attempted to integrate this model into unix_daemon.py, but it didn't yield any significant performance improvements.

# 50k files, 64k size, 1 parallel
Total: 5 directories, 50012 files, 0 symlinks
New: 50012 files, 0 symlinks
3277063424 bytes transferred in 217 seconds (14.41M/s)
real	3m42.103s
user	1m25.694s
sys	0m15.360s

# 50k files, 64k size, 2 parallel
Total: 5 directories, 50012 files, 0 symlinks
New: 50012 files, 0 symlinks
3277128960 bytes transferred in 518 seconds (6.03M/s)
real	5m0.634s
user	1m33.642s
sys	0m18.385s

# 50k files, 64k size, 4 parallel
Total: 5 directories, 50012 files, 0 symlinks
New: 50012 files, 0 symlinks
3277260032 bytes transferred in 1123 seconds (2.78M/s)
real	5m0.588s
user	1m42.999s
sys	0m22.693s

# 50k files, 64k size, 8 parallel
Total: 5 directories, 50012 files, 0 symlinks
New: 50012 files, 0 symlinks
3277704304 bytes transferred in 1878 seconds (1.66M/s)
real	5m0.585s
user	1m26.395s
sys	0m19.545s

Look forward to hearing from you

@giampaolo
Copy link
Owner

I'm having difficulty grasping how worker processes acquire connections.

As far as I remember, the parent / master process "passes" every new connection to one of the workers, so this may make things slower compared to the 1 process async model. If this is true, you may have more luck changing your benchmark so that it downloads, say, 10 files of 1G each instead of 50k files of 64K each. But it's just a supposition.

Also, what are you using for your benchmarks? Is it only one client downloading the file serially or there's multiple clients in parallel?

Note: I've never conducted benchmarks for the pre-fork model, so you're a pioneer in this sense. :)

@giampaolo
Copy link
Owner

PS: I see you're from Shenzhen. My wife is from there. :-)

@Howar-sz
Copy link
Author

As far as I remember, the parent / master process "passes" every new connection to one of the workers, so this may make things slower compared to the 1 process async model. If this is true, you may have more luck changing your benchmark so that it downloads, say, 10 files of 1G each instead of 50k files of 64K each. But it's just a supposition.

Is "passes" means any new ftp connection need to allocated by parent/master process? If subprocess was busied, parent process will waiting?

Also, what are you using for your benchmarks? Is it only one client downloading the file serially or there's multiple clients in parallel?

It's an uploads test. I use lftp with -e "mirror -R -c -P <parallel>" arguments as my benchmark tool. I think lftp is multiple ftp connections in parallel if parallel argument greater than one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants