First commit in the new EvoSQL repo

SERG-Delft · Sep 1, 2017 · 1015196 · 1015196
commit 1015196
Show file tree

Hide file tree

Showing 967 changed files with 453,561 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,143 @@
+evaluation/mem
+# Covers JetBrains IDEs: IntelliJ, RubyMine, PhpStorm, AppCode, PyCharm, CLion, Android Studio and Webstorm
+# Reference: https://intellij-support.jetbrains.com/hc/en-us/articles/206544839
+
+# User-specific stuff:
+.idea/**/workspace.xml
+.idea/**/tasks.xml
+.idea/dictionaries
+
+# Sensitive or high-churn files:
+.idea/**/dataSources/
+.idea/**/dataSources.ids
+.idea/**/dataSources.xml
+.idea/**/dataSources.local.xml
+.idea/**/sqlDataSources.xml
+.idea/**/dynamic.xml
+.idea/**/uiDesigner.xml
+
+# Gradle:
+.idea/**/gradle.xml
+.idea/**/libraries
+
+# CMake
+cmake-build-debug/
+
+# Mongo Explorer plugin:
+.idea/**/mongoSettings.xml
+
+## File-based project format:
+*.iws
+
+## Plugin-specific files:
+
+# IntelliJ
+out/
+
+# mpeltonen/sbt-idea plugin
+.idea_modules/
+
+# JIRA plugin
+atlassian-ide-plugin.xml
+
+# Cursive Clojure plugin
+.idea/replstate.xml
+
+# Crashlytics plugin (for Android Studio and IntelliJ)
+com_crashlytics_export_strings.xml
+crashlytics.properties
+crashlytics-build.properties
+fabric.properties
+
+
+
+evaluation/scenarios/*-process*/
+evaluation/scenarios/scenariosRQ1evaluation/
+evaluation/scenarios/scenariosTimeBudgetEval/
+
+results.csv
+create_db*.log
+.DS_Store
+
+instrumented-hsqldb/customers.csv
+instrumented-hsqldb/products.csv
+instrumented-hsqldb/Zulu.csv
+instrumented-hsqldb/test.data
+instrumented-hsqldb/test.log
+instrumented-hsqldb/test.properties
+instrumented-hsqldb/test.script
+instrumented-hsqldb/test3.lobs
+instrumented-hsqldb/test3.properties
+instrumented-hsqldb/test3.script
+instrumented-hsqldb/test.lck
+
+testdb*
+.settings/
+.classpath
+
+mem/
+
+# Java
+*.class
+
+# BlueJ files
+*.ctxt
+
+# Mobile Tools for Java (J2ME)
+.mtj.tmp/
+
+# Package Files #
+*.jar
+*.war
+*.ear
+
+# virtual machine crash logs, see http://www.java.com/en/download/help/error_hotspot.xml
+hs_err_pid*
+
+# Eclipse
+.metadata
+bin/
+tmp/
+*.tmp
+*.bak
+*.swp
+*~.nib
+local.properties
+.loadpath
+.recommenders
+
+# Eclipse Core
+.project
+
+# External tool builders
+.externalToolBuilders/
+
+# Locally stored "Eclipse launch configurations"
+*.launch
+
+# PyDev specific (Python IDE for Eclipse)
+*.pydevproject
+
+# CDT-specific (C/C++ Development Tooling)
+.cproject
+
+# Java annotation processor (APT)
+.factorypath
+
+# PDT-specific (PHP Development Tools)
+.buildpath
+
+# sbteclipse plugin
+.target
+
+# Tern plugin
+.tern-project
+
+# TeXlipse plugin
+.texlipse
+
+# STS (Spring Tool Suite)
+.springBeans
+
+# Code Recommenders
+.recommenders/
diff --git a/.idea/.name b/.idea/.name
diff --git a/.idea/compiler.xml b/.idea/compiler.xml
diff --git a/.idea/misc.xml b/.idea/misc.xml
diff --git a/.idea/modules.xml b/.idea/modules.xml
diff --git a/.idea/vcs.xml b/.idea/vcs.xml
diff --git a/documentation.md b/documentation.md
@@ -0,0 +1,62 @@
+# Documentation Thesis
+
+## Choices made
+
+### Logical AND/OR optimization
+AND/OR optimization has been turned off so that we can see all comparisons clearly.
+This is necessary to get a full picture of the distance.
+It could be said that in case of an OR if one of the sides is TRUE then the distance will always be 0, however if this OR is negated afterwards then it may be necessary that the OR resolves to FALSE.
+Hence the more we know the better.
+
+### Query parsing AND/OR optimization
+When HSQLDB parses queries, it calls a function `decomposeAndConditions` that does some optimization on the expressions. This creates more expressions which break the expression tree that we want to work with. Therefore it is now disabled. There is also a `decomposeOrConditions` method which we may need to disable later.
+
+- RangeVariableResolver.java (ca. L160): Use `queryConditions.add` directly instead of `decomposeAndConditions`
+
+### Indexing
+Indexing in HSQLDB has been turned off.
+It is unsure if it has been removed everywhere, but the changes we made to HSQLDB are:
+
+- RangeVariableResolver.java (ca. L1420): `currentIndex` is now always null.
+- RangeVariableResolver.java (ca. L1290): the method `setEqualityConditions` no longer does anything.
+- RangeVariableResolver.java (ca. L190&L200): index is always 0.
+
+### Defining distance
+To find the best fixture (Database State, individual) in our genetic algorithm, we need to be able to calculate the “distance” for every fixture.
+Our current implementation, where we have only looked at single table solutions, defines the distance as the minimum row distance, where row distance is the distance from a row passing the condition tree.
+This basically means that we look at the best row in the table, which is closest to passing the query condition, and the distance of this row is the distance of our individual.
+
+### Calculating distance
+To calculate distance in the instrumented HSQLDB we calculate distance by asking the top Comparison for its distance.
+This Comparison then asks all its children and depending on what kind of operation it is will calculate its distance.
+If a Comparison results to false, it's distance will be greater than 0. The number indicates how far away it is from being true.
+If a Comparison results to true, it's distance will be less than 0. The number then indicates how far away it is from being false.
+
+- AND: Add distance of children
+- OR: Take minimum of distance
+- NOT: Negative distance of the only child
+- OTHER: Calculate the distance between the left and right value
+
+## Instrumenting HSQLDB
+### Instrumenter
+The static leading class.
+This can be initialized externally and will then catch all expression Comparisons that are generated by HSQLDB and instrumented by us.
+It uses the current ExpressionTree (in currentConditionNode.root) to link every Comparison to its Expression in the tree.
+Gathering information (ExpressionInfo):
+
+1. The level of the node, meaning the expression depth. This is increased every time there is an AND with a parent OR or an OR with a parent AND. If the parent is equal to the current node they are on the same expression depth.
+2. The number of the node, which is the depth-first number in the expression tree.
+3. The current state of negativity, currently not used (29/12/2016), but this is a Boolean that is inverted whenever a NOT node occurs. The idea behind this variable is that when an expression is negated in the final top-level expression
+
+When it is detected that row condition has been evaluated, by the fact that the node is the top node in the current condition tree, the Comparisons are linked to their children for distance calculation later.
+Both gathering information and linking comparisons is done in a depth first recursive manner, first going down the left child and doing everything that needs to be done in there, and if necessary the same on the right child.
+
+### HSQLDB codebase
+The HSQLDB source code has been altered, each alternation can be found by the preceded //TUD\_\<FirstLast\> such as //TUD\_JC.
+Changes include:
+
+- Logical AND/OR optimization
+- Query parsing AND/OR optimization
+- Adding Comparisons to the Instrumenter (in the Expression classes)
+- Indexing turned off
+
diff --git a/evaluation/.gitignore b/evaluation/.gitignore
@@ -0,0 +1 @@
+/target/
diff --git a/evaluation/collectResults.py b/evaluation/collectResults.py
@@ -0,0 +1,77 @@
+import os
+
+def readScenarioResults(scenarioFolder, scenarioName, output, coverageOutput, dataOutput, first):
+    resultsPath = scenarioFolder + "/" + scenarioName + "/results.csv"
+    coveragePath = scenarioFolder + "/" + scenarioName + "/coverageResults.csv"
+    dataOutputPath = scenarioFolder + "/" + scenarioName + "/dataOutput.csv"
+    # Only proceed if results.csv exists
+    if not os.path.isfile(resultsPath):
+        return False
+
+    # Get system and process no from scenarioName
+    system = scenarioName[:scenarioName.find("-")]
+    processNo = scenarioName[scenarioName.find("process") + 7:]
+
+    with open(resultsPath) as f:
+        lines = f.readlines()
+        header = lines[0]
+        lines = lines[1:]
+        # Print header row if needed
+        if first:
+            output.write("System|Process Number|")
+            output.write(header)
+
+        for line in lines:
+            output.write(system + "|" + processNo + "|")
+            output.write(line)
+
+    # Only proceed if coverageResults.csv exists
+    if not os.path.isfile(coveragePath):
+        return False
+
+    with open(coveragePath) as f:
+        lines = f.readlines()
+        header = lines[0]
+        lines = lines[1:]
+        # Print header row if needed
+        if first:
+            coverageOutput.write("System|Process Number|")
+            coverageOutput.write(header)
+
+        for line in lines:
+            coverageOutput.write(system + "|" + processNo + "|")
+            coverageOutput.write(line)
+
+    # Only proceed if dataoutput exists
+    if not os.path.isfile(dataOutputPath):
+        return False
+
+    with open(dataOutputPath) as f:
+        lines = f.readlines()
+        header = lines[0]
+        lines = lines[1:]
+        # Print header row if needed
+        if first:
+            dataOutput.write("System|Process Number|")
+            dataOutput.write(header)
+
+        for line in lines:
+            dataOutput.write(system + "|" + processNo + "|")
+            dataOutput.write(line)
+
+    return True
+
+def collectResults(scenarioFolder, outputPrefix):
+    # Open output
+    with open(scenarioFolder + "/" + outputPrefix + "results.psv", "w") as output:
+        with open(scenarioFolder +"/" + outputPrefix + "coverageResults.psv", "w") as covOutput:
+            with open(scenarioFolder +"/" + outputPrefix + "dataOutput.psv", "w") as dataOutput:
+                first = True
+                # Loop through each process folder
+                for scenario in os.listdir(scenarioFolder):
+                    if "process" in scenario:
+                        if readScenarioResults(scenarioFolder, scenario, output, covOutput, dataOutput, first):
+                            first = False
+
+if __name__ == '__main__':
+    collectResults("scenarios", "")