Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Negations #4

Open
rainLiuplus opened this issue Nov 30, 2022 · 2 comments
Open

Negations #4

rainLiuplus opened this issue Nov 30, 2022 · 2 comments

Comments

@rainLiuplus
Copy link
Collaborator

rainLiuplus commented Nov 30, 2022

We discuss how to handle negations when applying Symlog to program repair in this issue.

Symlog has symbolic signs, which indicate whether the facts associated with themselves are positive or negative. Negative represents the fact is removed. Removing facts from the EDB database will cause fewer tuples generated if the given Datalog program is positive. But if the Datalog program contains negation, removing facts may cause more tuples derived. So, unlike positive Datalog where derived tuples are within some scope, tuples produced by stratified Datalog programs are difficult to estimate if we remove facts arbitrarily.

Possessing derived tuples is crucial to program repair since some derived tuples represent some detected bugs, and the goal of repair is to remove or add some facts such that those tuples cannot be derived. In principle, Symlog is capable of computing all possible produced tuples and their associated constraints. However, similar to conventional symbolic execution, the search space grows exponentially with the size of the EDB database, since the generated tuples may vary with each subset of the EDB database.

To mitigate the search explosion, we compute an over-approximation of derived tuples. We do so by discarding all negative literals in the given stratified Datalog program P, resulting in a new program P'. Since P' does not have any negative literal, the set of tuples generated from it is a superset of that from P given the same facts. Besides, P' is positive and monotonic, thus the set of derived tuples with all facts is a superset of that with fewer facts. So, the set of tuples produced from P' with all facts is an over-approximation of all possible tuples generated from P with an arbitrary subset of facts. We denote the set of tuples that represent some detected bugs as S. S is produced by running P' under all facts. If executing P does not generate any tuple in S, the user program is said to be repaired. To eliminate tuples in S, we can assert negation of their associated constraints (C) and solve these assertions.

But here is a problem, the constraints are collected by running P'. To get the constraints of running P, we need to add more constraints on C. The added constraints (C') are for the existence of some tuples which are negated and discarded. The conjunction of C and (not C') is the set of full constraints for S. The solution of (not (C and (not C'))) corresponds to repair patches of the analyzed user program. The set of C' is obtained in the same way, which may also consist of two parts like C and C', and so on and so forth.

@mechtaev
Copy link
Collaborator

mechtaev commented Dec 1, 2022

Overall, it makes sense, but the last part of using C and C' is still a bit vague.

@rainLiuplus
Copy link
Collaborator Author

rainLiuplus commented Dec 5, 2022

The following section discusses how to apply Symlog to program repair by handling negations stratum by stratum.

(Continuing from the fourth paragraph of the previous comment)

One way to mitigate search explosion is to use divide-and-conquer. In our case, we can divide the original Datalog program into strata and try to apply Symlog on each stratum. Let the $ith$ stratum be denoted as $P_i$. Each $P_i$ is semi-positive according to the definition of stratified Datalog, i.e. allow to only have negation in front of EDB relations. However, because of negations, semi-positive Datalog generally is not monotonic, thus we cannot directly apply Symlog to a stratum. But there is a case that the stratum is monotonic even with negations, that is: the literals which have negations in front of themselves only occur as negative in a stratum. The negative literals can be replaced with complementary positive literals, resulting a positive and monotonic stratum $P_i'$. So we can apply Symlog to $P_i'$. But before we transform $P_i$ to $P_i'$, we need to ensure $P_i$ has the property first.

Now we discuss how to divide the program to make the stratum have the 'only-negative' property if possible and how to handle the stratum that cannot have the property. We can make as many strata as possible have the property by dividing the negative literals and the corresponding positive literals into different strata. But such a split is not always possible. If the negative and positive literals define the same IDB, they must be in the same partition. This partition is therefore nonmonotonic, which is incompatible with requirements of Symlog's delta-debugging part. To handle this incompatibility, we can replace the delta-debugging with the method in the previous comment. The method probably has performance issue when dealing with large Datalog program. But the issue may be greatly alleviated here because we use it for much smaller strata. If the stratum is small enough, we can even use enumeration instead of SMT to solve the constraints of the method in the previous comment.

For the strata have 'only-negative' property, we can transform them to be positive and apply Symlog to them. But before applying Symlog a stratum $P_i'$, we need to figure out 'EDB' facts and domains of symbols of $P_i'$. The 'EDB' facts relating to original positive literals can be easily obtained by evaluating the original Datalog program $P$; they are the derived input tuples of $P_i$. For facts relating to original negated literals, there are in principle an infinite number of them, since the number of nonexistent tuples is always infinite. It is not a real problem because we do not need all the infinite facts. We can only set the derived tuples of the complementary positive literals during evaluating $P$ as facts ( $F_N$ ). If some fact is required to be added/removed to/from $F_N$ for program repair, the queries for non-existence/existence of the corresponding tuples of original negated literals will be broadcasted to the next stratum. Some query, like non-existence of some tuple, may already be satisfied, but there is no harm to check it again in the next stratum. After acquiring the 'EDB' facts, we can construct domains of symbols in $P_i'$ as usual, then apply Symlog to repair.

Regardless of whether we apply Symlog or Symlog without delta-debugging to $P_i'$ or $P_i$, some of the "EDB" facts of it will be modified, and these modifications will be broadcasted as queries to the next stratum. Now we discuss how to handle these queries when using Symlog for repair. if a query asserts non-existence of some tuple, we can use delta-debugging or the method mentioned in the previous comment to locate and then change the 'EDB' facts that derive the target tuple; if a query asserts existence of some tuple, we can require that the tuple generated by symbolic facts be equal to the tuple in the query. It is possible that some "EDB" facts need to be deleted and retained at the same time to satisfy all negative/positive queries. In this case, we can run the method for handling negative queries again to get different to-be-changed facts until they do not cause conflicts. We can do so because typically there is not just one way to remove an unwanted tuple. If we still cannot find compatible solutions after re-running the method for negative queries a certain number of times or after examining all possible modifications, we go back to previous stratum and generate different queries. We follow this strategy to generate queries from top stratum to bottom stratum. The changes for satisfying queries of the bottom stratum correspond to repair patches for the user program.

One thing we have not discussed yet is which EDBs in each stratum should be symbolized. One method is: we allow all kinds of 'EDB' in a stratum to have symbolic facts, then drop the symbolic facts which are not used for deriving the tuples in positive queries. These removed facts are the facts that are not in any 1-minimal set for target tuples or the solution of method in the previous comment. This method may not be efficient enough if the stratum is not small. A better option may be to first analyze the kind of facts required for target tuples and symbolize only those. (This part is not clear enough. We may need to discuss it further)

Regardless of which method we adopt, repair patches can always be generated. But it is difficult to guarantee the repair patches are optimal (e.g., minimum lines of changes), since we did not require the queries generated from a stratum to satisfy some optimization specification and the queries are broadcasted layer by layer. However, as long as the repair patches are reasonable, it is acceptable even if they are not optimal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants