Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference between llvmslicer and dg #107

Open
twang15 opened this issue Nov 10, 2016 · 7 comments
Open

Difference between llvmslicer and dg #107

twang15 opened this issue Nov 10, 2016 · 7 comments
Labels

Comments

@twang15
Copy link

twang15 commented Nov 10, 2016

Initial question:

I am using llvmslicer. Except llvmslicer is based on cfg and dg is based on pdg, what are the other major differences between dg and llvmslicer ? Does dg generate smaller slice ?

@twang15
Copy link
Author

twang15 commented Nov 10, 2016

Marek's reply

You mean LLVMSlicer? (https://github.com/jirislaby/LLVMSlicer). The major difference is that dg has more precise and sound analyses (which can lead to smaller slices). The slicing algorithm itself slices more or less the same, since I haven't implemented the two-pass algorithm of Horwitz nor any other more precise version (e.g. specialization slicing) yet. Actually, dg was designed as a replacement for LLVMSlicer in the tool Symbiotic (https://github.com/staticafi/symbiotic). You can find more detailed information about dg here: http://is.muni.cz/th/396236/fi_m/thesis.pdf

@twang15
Copy link
Author

twang15 commented Nov 10, 2016

Yes, it is LLVMSlicer.
LLVMSlicer is based on Mark Weiser's classic algorithm. As noted by others(Interprocedural Slicing Using Dependence Graphs), this CFG-based algorithm has "calling context" issue, but it is fixable (K. B. Gallagher. Some notes on interprocedural program slicing. In Source Code Analysis and Manipulation, 2004.) and the result would be equivalent to Horwitz's PDG/SDG-based algorithm.

When you say dg has more precise and sound analyses, it is in what sense ? Is the pointer analysis of dg more precise ?

@mchalupa
Copy link
Owner

Yeah, well our PDG based algorithm has the same calling context issue at this moment.
Yes, the pointer analysis is more evolved.

Dg has two pointer analyses: flow-insensitive and flow-sensitive. The flow-sensitive is more precise by design, but it is still not well debugged (there's at least one bug I know about). The flow-insensitive is basically the same as in LLVMSlicer, with these differences:

  • it uses UNKNOWN_OFFSET, which allows it to handle for example this issue in LLVMSlicer: Points-to analysis bug - accessing array elements jirislaby/LLVMSlicer#6
  • the UNKNOWN_OFFSET makes it also faster in some cases (e.g when analyzing arrays), since we can use UNKNOWN_OFFSET to stand for all offsets from some range, which is a sound approximation. So where LLVMSlicer would generate that p can point to x + 0, x + 4, x + 8, ..., x + 64 (if the offset is > 64, then the analysis goes unsound), we just say p can point to x + UNKNOWN_OFFSET, staying sound and fast (since our points-to set is of size 1 instead of some big number)
  • it handles correctly greater portion of LLVM than LLVMSlicer's one, for example, we have better support for memset and memcpy calls (copying whole blocks of memory with pointers), where the LLVMSlicer can be unsound
  • we run on LLVM 3.4 - 3.9

@twang15
Copy link
Author

twang15 commented Nov 11, 2016

Thanks for sharing these details.
I have the following further questions.

  1. For pointer analysis, it is well-known Andersen algorithm is practically
    precise.

As noted here
http://pages.cs.wisc.edu/~fischer/cs701.f08/lectures/Lecture26.4up.pdf,

"Both context-sensitive points-to
analysis and flow-sensitive points-to
analysis give little improvement over
Andersen’s analysis."

So, which pointer analysis algorithm are we using in dg ? Was there a
research paper to systematically show the pointer analysis algorithm in dg
doing better ?

  1. Does dg support concurrent program slicing ? Or do we plan to support it
    ?
    Both Mark weiser algorithm (which is also the implementation in
    LLVMSlicer) and the traditional PDG-based approach are for sequential
    program slicing. There are also concurrent slicing algorithms(Nanda 2006
    http://doi.acm.org/10.1145/1186632.1186636, Krinke 2003
    http://doi.acm.org/10.1145/940071.940096).
  2. Why does the PDG algorithm in dg also have the "calling context" problem
    ? I may resort to your thesis, however, it is great if you can explain in
    several sentence or give me a quick reference to understand the PDG
    algorithm you are using in dg.

On Fri, Nov 11, 2016 at 8:29 AM, Marek Chalupa [email protected]
wrote:

Yeah, well our PDG based algorithm has the same calling context issue at
this moment.
Yes, the pointer analysis is more evolved.

Dg has two pointer analyses: flow-insensitive and flow-sensitive. The
flow-sensitive is more precise by design, but it is still not well debugged
(there's at least one bug I know about). The flow-insensitive is basically
the same as in LLVMSlicer, with these differences:

  • it uses UNKNOWN_OFFSET, which allows it to handle for example this
    issue in LLVMSlicer: Points-to analysis bug - accessing array elements jirislaby/LLVMSlicer#6
    Points-to analysis bug - accessing array elements jirislaby/LLVMSlicer#6

    the UNKNOWN_OFFSET makes it also faster in some cases (e.g when
    analyzing arrays), since we can use UNKNOWN_OFFSET to stand for all offsets
    from some range, which is a sound approximation. So where LLVMSlicer would
    generate that p can point to x + 0, x + 4, x + 8, ..., x + 64 (if the
    offset is > 64, then the analysis goes unsound), we just say p can point to
    x + UNKNOWN_OFFSET, staying sound and fast (since our points-to set is of

    size 1 instead of some big number)

    it handles correctly greater portion of LLVM than LLVMSlicer's one,
    for example, we have better support for memset and memcpy calls (copying

    whole blocks of memory with pointers), where the LLVMSlicer can be unsound

    we run on LLVM 3.4 - 3.9


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#107 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/APd91hP2zbgl5N4B16sPPbvsv3k9WvPlks5q9G3SgaJpZM4KvO1y
.

@mchalupa
Copy link
Owner

  • Ad 1) In dg we use field-sensitive Andersen's analysis (the field sensitivity is configurable). There's no paper about measuring only the pointer analysis, but in the theses I linked above are measurements of differences between LLVMSlicer and dg (I suppose the numbers may have changed a bit, since some work was done on dg since the thesis).
  • Ad 2) Dg does not support slicing of concurrent programs. There may be some plans, but they are really no priority now as far as I can see (it may change in the future, though)
  • Ad 3) The calling context problem is because we slice with simple backward reachability. That is to say, we just compute the dependencies and than go backward over the edges as far as we can. Compared to the two-pass algorithm of Horwitz et. al., they compute additional summary edges and then perform two-passes over PDG to prevent the calling context problem. Our PDG is the same as the Horwitz's PDG (apart from that we do not compute the summary edges, since we would not use them atm.), but we just do only one pass over all dependecy edges in the graph, not two (over different subsets of edges). That lead to that our algorithm coincides with the CFG algorithm now. Reason why we use only simple backward reachability is simple: I just didn't get to implementing the two-pass algorithm yet (specialization slicing could be also a very good feature).

@twang15
Copy link
Author

twang15 commented Nov 13, 2016

  1. The paper "Which pointer analysis should I use ?" does comparison on
    five pointer analysis algorithms.

On Fri, Nov 11, 2016 at 2:36 PM, Marek Chalupa [email protected]
wrote:

Ad 1) In dg we use field-sensitive Andersen's analysis (the field
sensitivity is configurable). There's no paper about measuring only the
pointer analysis, but in the theses I linked above are measurements of
differences between LLVMSlicer and dg (I suppose the numbers may have
changed a bit, since some work was done on dg since the thesis).

Ad 2) Dg does not support slicing of concurrent programs. There may be
some plans, but they are really no priority now as far as I can see (it may
change in the future, though)

Ad 3) The calling context problem is because we slice with simple
backward reachability. That is to say, we just compute the dependencies and
than go backward over the edges as far as we can. Compared to the two-pass
algorithm of Horwitz et. al., they compute additional summary edges and
then perform two-passes over PDG to prevent the calling context problem.
Our PDG is the same as the Horwitz's PDG (apart from that we do not compute
the summary edges, since we would not use them atm.), but we just do only
one pass over all dependecy edges in the graph, not two (over different
subsets of edges). That lead to that our algorithm coincides with the CFG
algorithm now. Reason why we use only simple backward reachability is
simple: I just didn't get to implementing the two-pass algorithm yet
(specialization slicing could be also a very good feature).


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#107 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/APd91uXeCv0HGa7Qqvs518TZBr2SFaODks5q9MPBgaJpZM4KvO1y
.

@mchalupa
Copy link
Owner

Oh yeah, I meant there's no research paper comparing our pointer analyses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants