Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid memory allocations #67

Open
Flamefire opened this issue May 11, 2020 · 15 comments
Open

Avoid memory allocations #67

Flamefire opened this issue May 11, 2020 · 15 comments

Comments

@Flamefire
Copy link
Contributor

As the tracer should keep its own impact as low as possible, keeping an eye out for unnecessary memory allocations is vital.

Example:

void region_begin(const std::string& region_name, std::string module, std::string file_name,
called from
scorep::region_begin(region, module, file_name, line_number);

  • module and file_name are initially NULL-terminated C-Strings
  • For calling the function they are converted to std::strings very likely allocation heap memory and copying the contents
  • both are only used in the first region enter, so in most cases the above cost is wasted
  • file_name is then never used as a std::string but converted back to a C-String:
    file_name.c_str(), line_number);
    So all work is wasted completely
  • From module only a substring (prefix) is used (BTW: using substr instead of the ctor would likely read easier) Using resize instead would avoid an allocation and copy and using the find and substr on the original C-String would make it even cheaper

I'm aware this is a Python tracer so relative performance impact is not as large as for e.g. C++ programs but especially allocations can have a noticeable impact. So as un-C++ it is, I'd recommend using raw C-Strings for those 2 and similar for other functions

@AndreasGocht
Copy link
Collaborator

Even worse: Score-P copys the C-String again 😉 .
However, I am not a fan of mixing C-String with C++ Strings. And sticking to C only would mean to implement a hashtable, what is even more problematic in terms of performance.

Moreover, if you are worried about the performance, #5 should be addressed first. Approx 85% of the code are written in python. Each function enter or exit causes a lot of python action, see

def globaltrace_lt(self, frame, why, arg):
ff.

I think I'll reconsider this issue once #5 is fixed. However, I'll leave it open in the meantime. If you think it is necessary to fix this earlier, you are welcome to do a PR respecting the above-stated concerns.

Best,

Andreas

@Flamefire
Copy link
Contributor Author

However, I am not a fan of mixing C-String with C++ Strings. And sticking to C only would mean to implement a hashtable, what is even more problematic in terms of performance.

I see nothing wrong with using what the language offers. C-Strings are part of C++ and there are C++ standard library functions for dealing with them i.e. the entire header (which happen to be the same available in C but in std::, so "blessed") So using a std::string although you actually need a C-Cstring just for the sake of good looks would be the wrong reasoning IMO.

I think I'll reconsider this issue once #5 is fixed. However, I'll leave it open in the meantime.

Sure. I've just seen this and wanted to document it so it doesn't get forgotten.

@AndreasGocht
Copy link
Collaborator

AndreasGocht commented May 12, 2020

I see nothing wrong with using what the language offers. ...

The actual solution seems to be string_view:

https://en.cppreference.com/w/cpp/string/basic_string_view

together with a try_emplace to a std::unorderd_map<string>.

However, this would require C++17.

Best,

Andreas

@Flamefire
Copy link
Contributor Author

Flamefire commented May 12, 2020

Not really:

A typical implementation holds only two members: a pointer to constant CharT and a size.

So you'll at least pay for iterating over the string once to determine the size. So only keeping the C-String until you need a std::string is the least wasteful solution.

Edit: Additionally there is no c_str() to get back the C-String you need

@AndreasGocht
Copy link
Collaborator

Edit: Additionally there is no c_str() to get back the C-String you need

but a data()

@Flamefire
Copy link
Contributor Author

but a data()

... which is not guaranteed to contain a NULL terminated string.

@AndreasGocht
Copy link
Collaborator

AndreasGocht commented May 12, 2020

Which is never guaranteed with an array. But as it is only a view to a NUL terminated string, thats not an issue.

BTW more fun is this:

s# (str, read-only bytes-like object) [const char *, int or Py_ssize_t]

Like s*, except that it doesn’t accept mutable objects. The result is stored into two C variables, the first one a pointer to a C string, the second one its length. The string may contain embedded null > bytes. Unicode objects are converted to C strings using 'utf-8' encoding.

python doc

then you'll get the length for free. Moreover, it's more flexible.

Best,
Andreas

@Flamefire
Copy link
Contributor Author

Which is never guaranteed with an array.

Er what? A const char* is (usually) a C-String which is a NULL-terminated sequence of chars and what is needed to pass to Score-P

But as it is only a view to a NUL terminated string, thats not an issue.

So you want to store your C-String in a struct storing a pointer and size, then pass that to a function, in that function expect the pointer to be a C-String and ignore the size using that struct effectively as a convoluted way of transferring a C-String.

Before this gets more spammy I'll stop. Important info is in first post and #67 (comment).

@AndreasGocht
Copy link
Collaborator

Technical remark:

https://github.com/python/cpython/blob/42bae3a3d9d79f28e6b3b619bd27296d125c4c2c/Python/getargs.c#L989

The code for s#

    else if (PyUnicode_Check(arg)) {
                Py_ssize_t len;
                sarg = PyUnicode_AsUTF8AndSize(arg, &len);
                if (sarg == NULL)
                    return converterr(CONV_UNICODE,
                                      arg, msgbuf, bufsize);
                *p = sarg;
                STORE_SIZE(len);
            }

And the code for s

            else if (PyUnicode_Check(arg)) {
                sarg = PyUnicode_AsUTF8AndSize(arg, &len);
                if (sarg == NULL)
                    return converterr(CONV_UNICODE,
                                      arg, msgbuf, bufsize);
                if (strlen(sarg) != (size_t)len) {
                    PyErr_SetString(PyExc_ValueError, "embedded null character");
                    RETURN_ERR_OCCURRED;
                }
                *p = sarg;
            }

So s does an additional comparison of the returned String to be sure, that there is no NUL character in UTF8 coded Strings. We should be able to avoid this at least for function and module names, as python does not allow NUL as part of Identifiers (see https://www.python.org/dev/peps/pep-3131/#specification-of-language-changes and https://docs.python.org/3.7/reference/lexical_analysis.html#identifiers )

@AndreasGocht
Copy link
Collaborator

The same applies to filenames on UNIX, (see https://unix.stackexchange.com/questions/38055/utf-8-filenames ). So we could avoid this as well.

@AndreasGocht
Copy link
Collaborator

#42

@Flamefire
Copy link
Contributor Author

My proposal would be to use a simple, slim wrapper around a C-String:

class CString{
  CString(const char* s, size_t len); // Could be guarded by assert for tests
  CString(const char[] s); // With template to deduce size from raw strings (trivial)
  bool operator==(const CString&);
  bool operator!=(const CString&);
  size_t find(char) const;
  const char* c_str() const;
  size_t size() const;
}

This is very similar to string_view but focuses on/guarantees NULL-terminated strings. The implementation is incredibly simple so verifiable by just looking at it. Nothing really happening there. And the comparison operators save call sites from using the (IMO strange) C string comparison functions. Some interop functions with std::string could be added too as needed (e.g fast conversion, direct comparison, ...) but those are also very simple.

This then allows accepting the Python values with known size and passing them to functions (like region_begin) that currently take std::string but only pass them as c_str() down to Score-P and maybe compare them or take substrings.

If you agree with this design I'll create a PR with an implementation of this proposal

@AndreasGocht
Copy link
Collaborator

Sounds reasonable. Please go ahead.

Best,

Andreas

@Flamefire
Copy link
Contributor Author

For some numbers: Replacing the std::strings by const char* where it is beneficial results in these numbers:

0.3920-0.4455, mean: 0.4115, median: 0.4073
0.9087-0.9889, mean: 0.9460, median: 0.9460
1.4548-1.5868, mean: 1.5129, median: 1.5108

0.3861-0.4372, mean: 0.4071, median: 0.4066
0.9111-1.0089, mean: 0.9551, median: 0.9547
1.4719-1.5940, mean: 1.5263, median: 1.5262

Top is with C-Strings for 100k, 300k and 500k iterations of the simplefunc benchmark (min, max, mean and median over 51 invocations) and bottom is current master using std::string. All using the cProfile instrumenter

So there is some effect although not much. The small case is even slower for some reason but I think this is an artifact from the small runtime.

I'd expect the effect to be more visible when memory pressure is higher especially as the benchmark uses only 1 function hence the memory allocator might return always the last used chunk which is faster than "real-world" use cases of random length functions

@AndreasGocht
Copy link
Collaborator

AndreasGocht commented Jun 24, 2020

After merging #105 you might want to try again, and use s#. As the creation of region_names is only done, when necessary, this might save a lot of runtime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants