-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Negations #4
Comments
Overall, it makes sense, but the last part of using C and C' is still a bit vague. |
The following section discusses how to apply Symlog to program repair by handling negations stratum by stratum. (Continuing from the fourth paragraph of the previous comment) One way to mitigate search explosion is to use divide-and-conquer. In our case, we can divide the original Datalog program into strata and try to apply Symlog on each stratum. Let the Now we discuss how to divide the program to make the stratum have the 'only-negative' property if possible and how to handle the stratum that cannot have the property. We can make as many strata as possible have the property by dividing the negative literals and the corresponding positive literals into different strata. But such a split is not always possible. If the negative and positive literals define the same IDB, they must be in the same partition. This partition is therefore nonmonotonic, which is incompatible with requirements of Symlog's delta-debugging part. To handle this incompatibility, we can replace the delta-debugging with the method in the previous comment. The method probably has performance issue when dealing with large Datalog program. But the issue may be greatly alleviated here because we use it for much smaller strata. If the stratum is small enough, we can even use enumeration instead of SMT to solve the constraints of the method in the previous comment. For the strata have 'only-negative' property, we can transform them to be positive and apply Symlog to them. But before applying Symlog a stratum Regardless of whether we apply Symlog or Symlog without delta-debugging to One thing we have not discussed yet is which EDBs in each stratum should be symbolized. One method is: we allow all kinds of 'EDB' in a stratum to have symbolic facts, then drop the symbolic facts which are not used for deriving the tuples in positive queries. These removed facts are the facts that are not in any 1-minimal set for target tuples or the solution of method in the previous comment. This method may not be efficient enough if the stratum is not small. A better option may be to first analyze the kind of facts required for target tuples and symbolize only those. (This part is not clear enough. We may need to discuss it further) Regardless of which method we adopt, repair patches can always be generated. But it is difficult to guarantee the repair patches are optimal (e.g., minimum lines of changes), since we did not require the queries generated from a stratum to satisfy some optimization specification and the queries are broadcasted layer by layer. However, as long as the repair patches are reasonable, it is acceptable even if they are not optimal. |
We discuss how to handle negations when applying Symlog to program repair in this issue.
Symlog has symbolic signs, which indicate whether the facts associated with themselves are positive or negative. Negative represents the fact is removed. Removing facts from the EDB database will cause fewer tuples generated if the given Datalog program is positive. But if the Datalog program contains negation, removing facts may cause more tuples derived. So, unlike positive Datalog where derived tuples are within some scope, tuples produced by stratified Datalog programs are difficult to estimate if we remove facts arbitrarily.
Possessing derived tuples is crucial to program repair since some derived tuples represent some detected bugs, and the goal of repair is to remove or add some facts such that those tuples cannot be derived. In principle, Symlog is capable of computing all possible produced tuples and their associated constraints. However, similar to conventional symbolic execution, the search space grows exponentially with the size of the EDB database, since the generated tuples may vary with each subset of the EDB database.
To mitigate the search explosion, we compute an over-approximation of derived tuples. We do so by discarding all negative literals in the given stratified Datalog program P, resulting in a new program P'. Since P' does not have any negative literal, the set of tuples generated from it is a superset of that from P given the same facts. Besides, P' is positive and monotonic, thus the set of derived tuples with all facts is a superset of that with fewer facts. So, the set of tuples produced from P' with all facts is an over-approximation of all possible tuples generated from P with an arbitrary subset of facts. We denote the set of tuples that represent some detected bugs as S. S is produced by running P' under all facts. If executing P does not generate any tuple in S, the user program is said to be repaired. To eliminate tuples in S, we can assert negation of their associated constraints (C) and solve these assertions.
But here is a problem, the constraints are collected by running P'. To get the constraints of running P, we need to add more constraints on C. The added constraints (C') are for the existence of some tuples which are negated and discarded. The conjunction of C and (not C') is the set of full constraints for S. The solution of (not (C and (not C'))) corresponds to repair patches of the analyzed user program. The set of C' is obtained in the same way, which may also consist of two parts like C and C', and so on and so forth.
The text was updated successfully, but these errors were encountered: