Difference between llvmslicer and dg #107

twang15 · 2016-11-10T22:50:44Z

Initial question:

I am using llvmslicer. Except llvmslicer is based on cfg and dg is based on pdg, what are the other major differences between dg and llvmslicer ? Does dg generate smaller slice ?

twang15 · 2016-11-10T22:51:05Z

Marek's reply

You mean LLVMSlicer? (https://github.com/jirislaby/LLVMSlicer). The major difference is that dg has more precise and sound analyses (which can lead to smaller slices). The slicing algorithm itself slices more or less the same, since I haven't implemented the two-pass algorithm of Horwitz nor any other more precise version (e.g. specialization slicing) yet. Actually, dg was designed as a replacement for LLVMSlicer in the tool Symbiotic (https://github.com/staticafi/symbiotic). You can find more detailed information about dg here: http://is.muni.cz/th/396236/fi_m/thesis.pdf

twang15 · 2016-11-10T23:01:04Z

Yes, it is LLVMSlicer.
LLVMSlicer is based on Mark Weiser's classic algorithm. As noted by others(Interprocedural Slicing Using Dependence Graphs), this CFG-based algorithm has "calling context" issue, but it is fixable (K. B. Gallagher. Some notes on interprocedural program slicing. In Source Code Analysis and Manipulation, 2004.) and the result would be equivalent to Horwitz's PDG/SDG-based algorithm.

When you say dg has more precise and sound analyses, it is in what sense ? Is the pointer analysis of dg more precise ?

mchalupa · 2016-11-11T13:29:54Z

Yeah, well our PDG based algorithm has the same calling context issue at this moment.
Yes, the pointer analysis is more evolved.

Dg has two pointer analyses: flow-insensitive and flow-sensitive. The flow-sensitive is more precise by design, but it is still not well debugged (there's at least one bug I know about). The flow-insensitive is basically the same as in LLVMSlicer, with these differences:

it uses UNKNOWN_OFFSET, which allows it to handle for example this issue in LLVMSlicer: Points-to analysis bug - accessing array elements jirislaby/LLVMSlicer#6
the UNKNOWN_OFFSET makes it also faster in some cases (e.g when analyzing arrays), since we can use UNKNOWN_OFFSET to stand for all offsets from some range, which is a sound approximation. So where LLVMSlicer would generate that p can point to x + 0, x + 4, x + 8, ..., x + 64 (if the offset is > 64, then the analysis goes unsound), we just say p can point to x + UNKNOWN_OFFSET, staying sound and fast (since our points-to set is of size 1 instead of some big number)
it handles correctly greater portion of LLVM than LLVMSlicer's one, for example, we have better support for memset and memcpy calls (copying whole blocks of memory with pointers), where the LLVMSlicer can be unsound
we run on LLVM 3.4 - 3.9

twang15 · 2016-11-11T17:24:35Z

Thanks for sharing these details.
I have the following further questions.

For pointer analysis, it is well-known Andersen algorithm is practically
precise.

As noted here
http://pages.cs.wisc.edu/~fischer/cs701.f08/lectures/Lecture26.4up.pdf,

"Both context-sensitive points-to
analysis and flow-sensitive points-to
analysis give little improvement over
Andersen’s analysis."

So, which pointer analysis algorithm are we using in dg ? Was there a
research paper to systematically show the pointer analysis algorithm in dg
doing better ?

Does dg support concurrent program slicing ? Or do we plan to support it
?
Both Mark weiser algorithm (which is also the implementation in
LLVMSlicer) and the traditional PDG-based approach are for sequential
program slicing. There are also concurrent slicing algorithms(Nanda 2006
http://doi.acm.org/10.1145/1186632.1186636, Krinke 2003
http://doi.acm.org/10.1145/940071.940096).
Why does the PDG algorithm in dg also have the "calling context" problem
? I may resort to your thesis, however, it is great if you can explain in
several sentence or give me a quick reference to understand the PDG
algorithm you are using in dg.

On Fri, Nov 11, 2016 at 8:29 AM, Marek Chalupa [email protected]
wrote:

Yeah, well our PDG based algorithm has the same calling context issue at
this moment.
Yes, the pointer analysis is more evolved.

Dg has two pointer analyses: flow-insensitive and flow-sensitive. The
flow-sensitive is more precise by design, but it is still not well debugged
(there's at least one bug I know about). The flow-insensitive is basically
the same as in LLVMSlicer, with these differences:

it uses UNKNOWN_OFFSET, which allows it to handle for example this
issue in LLVMSlicer: Points-to analysis bug - accessing array elements jirislaby/LLVMSlicer#6
Points-to analysis bug - accessing array elements jirislaby/LLVMSlicer#6

the UNKNOWN_OFFSET makes it also faster in some cases (e.g when
analyzing arrays), since we can use UNKNOWN_OFFSET to stand for all offsets
from some range, which is a sound approximation. So where LLVMSlicer would
generate that p can point to x + 0, x + 4, x + 8, ..., x + 64 (if the
offset is > 64, then the analysis goes unsound), we just say p can point to
x + UNKNOWN_OFFSET, staying sound and fast (since our points-to set is of

size 1 instead of some big number)

it handles correctly greater portion of LLVM than LLVMSlicer's one,
for example, we have better support for memset and memcpy calls (copying

whole blocks of memory with pointers), where the LLVMSlicer can be unsound

we run on LLVM 3.4 - 3.9

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#107 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/APd91hP2zbgl5N4B16sPPbvsv3k9WvPlks5q9G3SgaJpZM4KvO1y
.

mchalupa · 2016-11-11T19:36:33Z

Ad 1) In dg we use field-sensitive Andersen's analysis (the field sensitivity is configurable). There's no paper about measuring only the pointer analysis, but in the theses I linked above are measurements of differences between LLVMSlicer and dg (I suppose the numbers may have changed a bit, since some work was done on dg since the thesis).
Ad 2) Dg does not support slicing of concurrent programs. There may be some plans, but they are really no priority now as far as I can see (it may change in the future, though)
Ad 3) The calling context problem is because we slice with simple backward reachability. That is to say, we just compute the dependencies and than go backward over the edges as far as we can. Compared to the two-pass algorithm of Horwitz et. al., they compute additional summary edges and then perform two-passes over PDG to prevent the calling context problem. Our PDG is the same as the Horwitz's PDG (apart from that we do not compute the summary edges, since we would not use them atm.), but we just do only one pass over all dependecy edges in the graph, not two (over different subsets of edges). That lead to that our algorithm coincides with the CFG algorithm now. Reason why we use only simple backward reachability is simple: I just didn't get to implementing the two-pass algorithm yet (specialization slicing could be also a very good feature).

twang15 · 2016-11-13T18:40:40Z

The paper "Which pointer analysis should I use ?" does comparison on
five pointer analysis algorithms.

On Fri, Nov 11, 2016 at 2:36 PM, Marek Chalupa [email protected]
wrote:

Ad 1) In dg we use field-sensitive Andersen's analysis (the field
sensitivity is configurable). There's no paper about measuring only the
pointer analysis, but in the theses I linked above are measurements of
differences between LLVMSlicer and dg (I suppose the numbers may have
changed a bit, since some work was done on dg since the thesis).

Ad 2) Dg does not support slicing of concurrent programs. There may be
some plans, but they are really no priority now as far as I can see (it may
change in the future, though)

Ad 3) The calling context problem is because we slice with simple
backward reachability. That is to say, we just compute the dependencies and
than go backward over the edges as far as we can. Compared to the two-pass
algorithm of Horwitz et. al., they compute additional summary edges and
then perform two-passes over PDG to prevent the calling context problem.
Our PDG is the same as the Horwitz's PDG (apart from that we do not compute
the summary edges, since we would not use them atm.), but we just do only
one pass over all dependecy edges in the graph, not two (over different
subsets of edges). That lead to that our algorithm coincides with the CFG
algorithm now. Reason why we use only simple backward reachability is
simple: I just didn't get to implementing the two-pass algorithm yet
(specialization slicing could be also a very good feature).

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#107 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/APd91uXeCv0HGa7Qqvs518TZBr2SFaODks5q9MPBgaJpZM4KvO1y
.

mchalupa · 2016-11-13T19:35:01Z

Oh yeah, I meant there's no research paper comparing our pointer analyses.

mchalupa added the question label Dec 6, 2016

mchalupa mentioned this issue Feb 20, 2017

Interprocedural slicing failed #140

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Difference between llvmslicer and dg #107

Difference between llvmslicer and dg #107

twang15 commented Nov 10, 2016

twang15 commented Nov 10, 2016

twang15 commented Nov 10, 2016

mchalupa commented Nov 11, 2016

twang15 commented Nov 11, 2016

size 1 instead of some big number)

whole blocks of memory with pointers), where the LLVMSlicer can be unsound

mchalupa commented Nov 11, 2016

twang15 commented Nov 13, 2016

Ad 2) Dg does not support slicing of concurrent programs. There may be
some plans, but they are really no priority now as far as I can see (it may
change in the future, though)

mchalupa commented Nov 13, 2016

Difference between llvmslicer and dg #107

Difference between llvmslicer and dg #107

Comments

twang15 commented Nov 10, 2016

twang15 commented Nov 10, 2016

twang15 commented Nov 10, 2016

mchalupa commented Nov 11, 2016

twang15 commented Nov 11, 2016

size 1 instead of some big number)

whole blocks of memory with pointers), where the LLVMSlicer can be unsound

mchalupa commented Nov 11, 2016

twang15 commented Nov 13, 2016

Ad 2) Dg does not support slicing of concurrent programs. There may be some plans, but they are really no priority now as far as I can see (it may change in the future, though)

mchalupa commented Nov 13, 2016

Ad 2) Dg does not support slicing of concurrent programs. There may be
some plans, but they are really no priority now as far as I can see (it may
change in the future, though)