Avoid memory allocations #67

Flamefire · 2020-05-11T15:20:24Z

As the tracer should keep its own impact as low as possible, keeping an eye out for unnecessary memory allocations is vital.

Example:

scorep_binding_python/src/scorep.cpp

Line 23 in f9fbd7d

    
           void region_begin(const std::string& region_name, std::string module, std::string file_name,

called from

scorep_binding_python/src/scorep.cpp

Line 167 in f9fbd7d

scorep::region_begin(region, module, file_name, line_number);

module and file_name are initially NULL-terminated C-Strings
For calling the function they are converted to std::strings very likely allocation heap memory and copying the contents
both are only used in the first region enter, so in most cases the above cost is wasted
file_name is then never used as a std::string but converted back to a C-String:

scorep_binding_python/src/scorep.cpp

Line 33 in f9fbd7d

file_name.c_str(), line_number);

So all work is wasted completely
From module only a substring (prefix) is used (BTW: using substr instead of the ctor would likely read easier) Using resize instead would avoid an allocation and copy and using the find and substr on the original C-String would make it even cheaper

I'm aware this is a Python tracer so relative performance impact is not as large as for e.g. C++ programs but especially allocations can have a noticeable impact. So as un-C++ it is, I'd recommend using raw C-Strings for those 2 and similar for other functions

The text was updated successfully, but these errors were encountered:

AndreasGocht · 2020-05-12T07:06:37Z

Even worse: Score-P copys the C-String again 😉 .
However, I am not a fan of mixing C-String with C++ Strings. And sticking to C only would mean to implement a hashtable, what is even more problematic in terms of performance.

Moreover, if you are worried about the performance, #5 should be addressed first. Approx 85% of the code are written in python. Each function enter or exit causes a lot of python action, see

scorep_binding_python/scorep/instrumenters/scorep_profile.py

Line 76 in f9fbd7d

def globaltrace_lt(self, frame, why, arg):

ff.

I think I'll reconsider this issue once #5 is fixed. However, I'll leave it open in the meantime. If you think it is necessary to fix this earlier, you are welcome to do a PR respecting the above-stated concerns.

Best,

Andreas

Flamefire · 2020-05-12T07:27:17Z

However, I am not a fan of mixing C-String with C++ Strings. And sticking to C only would mean to implement a hashtable, what is even more problematic in terms of performance.

I see nothing wrong with using what the language offers. C-Strings are part of C++ and there are C++ standard library functions for dealing with them i.e. the entire header (which happen to be the same available in C but in std::, so "blessed") So using a std::string although you actually need a C-Cstring just for the sake of good looks would be the wrong reasoning IMO.

I think I'll reconsider this issue once #5 is fixed. However, I'll leave it open in the meantime.

Sure. I've just seen this and wanted to document it so it doesn't get forgotten.

AndreasGocht · 2020-05-12T13:35:01Z

I see nothing wrong with using what the language offers. ...

The actual solution seems to be string_view:

https://en.cppreference.com/w/cpp/string/basic_string_view

together with a try_emplace to a std::unorderd_map<string>.

However, this would require C++17.

Best,

Andreas

Flamefire · 2020-05-12T13:50:06Z

Not really:

A typical implementation holds only two members: a pointer to constant CharT and a size.

So you'll at least pay for iterating over the string once to determine the size. So only keeping the C-String until you need a std::string is the least wasteful solution.

Edit: Additionally there is no c_str() to get back the C-String you need

AndreasGocht · 2020-05-12T14:11:00Z

Edit: Additionally there is no c_str() to get back the C-String you need

but a data()

Flamefire · 2020-05-12T14:13:06Z

but a data()

... which is not guaranteed to contain a NULL terminated string.

AndreasGocht · 2020-05-12T14:19:21Z

Which is never guaranteed with an array. But as it is only a view to a NUL terminated string, thats not an issue.

BTW more fun is this:

s# (str, read-only bytes-like object) [const char *, int or Py_ssize_t]

Like s*, except that it doesn’t accept mutable objects. The result is stored into two C variables, the first one a pointer to a C string, the second one its length. The string may contain embedded null > bytes. Unicode objects are converted to C strings using 'utf-8' encoding.

python doc

then you'll get the length for free. Moreover, it's more flexible.

Best,
Andreas

Flamefire · 2020-05-12T14:26:38Z

Which is never guaranteed with an array.

Er what? A const char* is (usually) a C-String which is a NULL-terminated sequence of chars and what is needed to pass to Score-P

But as it is only a view to a NUL terminated string, thats not an issue.

So you want to store your C-String in a struct storing a pointer and size, then pass that to a function, in that function expect the pointer to be a C-String and ignore the size using that struct effectively as a convoluted way of transferring a C-String.

Before this gets more spammy I'll stop. Important info is in first post and #67 (comment).

AndreasGocht · 2020-05-13T08:58:29Z

Technical remark:

https://github.com/python/cpython/blob/42bae3a3d9d79f28e6b3b619bd27296d125c4c2c/Python/getargs.c#L989

The code for s#

    else if (PyUnicode_Check(arg)) {
                Py_ssize_t len;
                sarg = PyUnicode_AsUTF8AndSize(arg, &len);
                if (sarg == NULL)
                    return converterr(CONV_UNICODE,
                                      arg, msgbuf, bufsize);
                *p = sarg;
                STORE_SIZE(len);
            }

And the code for s

            else if (PyUnicode_Check(arg)) {
                sarg = PyUnicode_AsUTF8AndSize(arg, &len);
                if (sarg == NULL)
                    return converterr(CONV_UNICODE,
                                      arg, msgbuf, bufsize);
                if (strlen(sarg) != (size_t)len) {
                    PyErr_SetString(PyExc_ValueError, "embedded null character");
                    RETURN_ERR_OCCURRED;
                }
                *p = sarg;
            }

So s does an additional comparison of the returned String to be sure, that there is no NUL character in UTF8 coded Strings. We should be able to avoid this at least for function and module names, as python does not allow NUL as part of Identifiers (see https://www.python.org/dev/peps/pep-3131/#specification-of-language-changes and https://docs.python.org/3.7/reference/lexical_analysis.html#identifiers )

AndreasGocht · 2020-05-13T09:00:35Z

The same applies to filenames on UNIX, (see https://unix.stackexchange.com/questions/38055/utf-8-filenames ). So we could avoid this as well.

AndreasGocht · 2020-05-17T12:38:11Z

#42

Flamefire · 2020-06-08T09:37:12Z

My proposal would be to use a simple, slim wrapper around a C-String:

class CString{
  CString(const char* s, size_t len); // Could be guarded by assert for tests
  CString(const char[] s); // With template to deduce size from raw strings (trivial)
  bool operator==(const CString&);
  bool operator!=(const CString&);
  size_t find(char) const;
  const char* c_str() const;
  size_t size() const;
}

This is very similar to string_view but focuses on/guarantees NULL-terminated strings. The implementation is incredibly simple so verifiable by just looking at it. Nothing really happening there. And the comparison operators save call sites from using the (IMO strange) C string comparison functions. Some interop functions with std::string could be added too as needed (e.g fast conversion, direct comparison, ...) but those are also very simple.

This then allows accepting the Python values with known size and passing them to functions (like region_begin) that currently take std::string but only pass them as c_str() down to Score-P and maybe compare them or take substrings.

If you agree with this design I'll create a PR with an implementation of this proposal

AndreasGocht · 2020-06-12T15:22:34Z

Sounds reasonable. Please go ahead.

Best,

Andreas

Flamefire · 2020-06-16T14:14:18Z

For some numbers: Replacing the std::strings by const char* where it is beneficial results in these numbers:

0.3920-0.4455, mean: 0.4115, median: 0.4073
0.9087-0.9889, mean: 0.9460, median: 0.9460
1.4548-1.5868, mean: 1.5129, median: 1.5108

0.3861-0.4372, mean: 0.4071, median: 0.4066
0.9111-1.0089, mean: 0.9551, median: 0.9547
1.4719-1.5940, mean: 1.5263, median: 1.5262

Top is with C-Strings for 100k, 300k and 500k iterations of the simplefunc benchmark (min, max, mean and median over 51 invocations) and bottom is current master using std::string. All using the cProfile instrumenter

So there is some effect although not much. The small case is even slower for some reason but I think this is an artifact from the small runtime.

I'd expect the effect to be more visible when memory pressure is higher especially as the benchmark uses only 1 function hence the memory allocator might return always the last used chunk which is faster than "real-world" use cases of random length functions

AndreasGocht · 2020-06-24T13:28:31Z

After merging #105 you might want to try again, and use s#. As the creation of region_names is only done, when necessary, this might save a lot of runtime.

AndreasGocht added the enhancement label May 12, 2020

AndreasGocht mentioned this issue Jun 24, 2020

Use the codeobject pointer to idetify regions #105

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid memory allocations #67

Avoid memory allocations #67

Flamefire commented May 11, 2020

AndreasGocht commented May 12, 2020

Flamefire commented May 12, 2020

AndreasGocht commented May 12, 2020 •

edited

Loading

Flamefire commented May 12, 2020 •

edited

Loading

AndreasGocht commented May 12, 2020

Flamefire commented May 12, 2020

AndreasGocht commented May 12, 2020 •

edited

Loading

Flamefire commented May 12, 2020

AndreasGocht commented May 13, 2020

AndreasGocht commented May 13, 2020

AndreasGocht commented May 17, 2020

Flamefire commented Jun 8, 2020

AndreasGocht commented Jun 12, 2020

Flamefire commented Jun 16, 2020

AndreasGocht commented Jun 24, 2020 •

edited

Loading

Avoid memory allocations #67

Avoid memory allocations #67

Comments

Flamefire commented May 11, 2020

AndreasGocht commented May 12, 2020

Flamefire commented May 12, 2020

AndreasGocht commented May 12, 2020 • edited Loading

Flamefire commented May 12, 2020 • edited Loading

AndreasGocht commented May 12, 2020

Flamefire commented May 12, 2020

AndreasGocht commented May 12, 2020 • edited Loading

Flamefire commented May 12, 2020

AndreasGocht commented May 13, 2020

Technical remark:

AndreasGocht commented May 13, 2020

AndreasGocht commented May 17, 2020

Flamefire commented Jun 8, 2020

AndreasGocht commented Jun 12, 2020

Flamefire commented Jun 16, 2020

AndreasGocht commented Jun 24, 2020 • edited Loading

AndreasGocht commented May 12, 2020 •

edited

Loading

Flamefire commented May 12, 2020 •

edited

Loading

AndreasGocht commented May 12, 2020 •

edited

Loading

AndreasGocht commented Jun 24, 2020 •

edited

Loading