Skip to content

Commit

Permalink
First commit in the new EvoSQL repo
Browse files Browse the repository at this point in the history
  • Loading branch information
mauricioaniche committed Sep 1, 2017
0 parents commit 1015196
Show file tree
Hide file tree
Showing 967 changed files with 453,561 additions and 0 deletions.
143 changes: 143 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
evaluation/mem
# Covers JetBrains IDEs: IntelliJ, RubyMine, PhpStorm, AppCode, PyCharm, CLion, Android Studio and Webstorm
# Reference: https://intellij-support.jetbrains.com/hc/en-us/articles/206544839

# User-specific stuff:
.idea/**/workspace.xml
.idea/**/tasks.xml
.idea/dictionaries

# Sensitive or high-churn files:
.idea/**/dataSources/
.idea/**/dataSources.ids
.idea/**/dataSources.xml
.idea/**/dataSources.local.xml
.idea/**/sqlDataSources.xml
.idea/**/dynamic.xml
.idea/**/uiDesigner.xml

# Gradle:
.idea/**/gradle.xml
.idea/**/libraries

# CMake
cmake-build-debug/

# Mongo Explorer plugin:
.idea/**/mongoSettings.xml

## File-based project format:
*.iws

## Plugin-specific files:

# IntelliJ
out/

# mpeltonen/sbt-idea plugin
.idea_modules/

# JIRA plugin
atlassian-ide-plugin.xml

# Cursive Clojure plugin
.idea/replstate.xml

# Crashlytics plugin (for Android Studio and IntelliJ)
com_crashlytics_export_strings.xml
crashlytics.properties
crashlytics-build.properties
fabric.properties



evaluation/scenarios/*-process*/
evaluation/scenarios/scenariosRQ1evaluation/
evaluation/scenarios/scenariosTimeBudgetEval/

results.csv
create_db*.log
.DS_Store

instrumented-hsqldb/customers.csv
instrumented-hsqldb/products.csv
instrumented-hsqldb/Zulu.csv
instrumented-hsqldb/test.data
instrumented-hsqldb/test.log
instrumented-hsqldb/test.properties
instrumented-hsqldb/test.script
instrumented-hsqldb/test3.lobs
instrumented-hsqldb/test3.properties
instrumented-hsqldb/test3.script
instrumented-hsqldb/test.lck

testdb*
.settings/
.classpath

mem/

# Java
*.class

# BlueJ files
*.ctxt

# Mobile Tools for Java (J2ME)
.mtj.tmp/

# Package Files #
*.jar
*.war
*.ear

# virtual machine crash logs, see http://www.java.com/en/download/help/error_hotspot.xml
hs_err_pid*

# Eclipse
.metadata
bin/
tmp/
*.tmp
*.bak
*.swp
*~.nib
local.properties
.loadpath
.recommenders

# Eclipse Core
.project

# External tool builders
.externalToolBuilders/

# Locally stored "Eclipse launch configurations"
*.launch

# PyDev specific (Python IDE for Eclipse)
*.pydevproject

# CDT-specific (C/C++ Development Tooling)
.cproject

# Java annotation processor (APT)
.factorypath

# PDT-specific (PHP Development Tools)
.buildpath

# sbteclipse plugin
.target

# Tern plugin
.tern-project

# TeXlipse plugin
.texlipse

# STS (Spring Tool Suite)
.springBeans

# Code Recommenders
.recommenders/
1 change: 1 addition & 0 deletions .idea/.name

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

21 changes: 21 additions & 0 deletions .idea/compiler.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

29 changes: 29 additions & 0 deletions .idea/misc.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

11 changes: 11 additions & 0 deletions .idea/modules.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 6 additions & 0 deletions .idea/vcs.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

62 changes: 62 additions & 0 deletions documentation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Documentation Thesis

## Choices made

### Logical AND/OR optimization
AND/OR optimization has been turned off so that we can see all comparisons clearly.
This is necessary to get a full picture of the distance.
It could be said that in case of an OR if one of the sides is TRUE then the distance will always be 0, however if this OR is negated afterwards then it may be necessary that the OR resolves to FALSE.
Hence the more we know the better.

### Query parsing AND/OR optimization
When HSQLDB parses queries, it calls a function `decomposeAndConditions` that does some optimization on the expressions. This creates more expressions which break the expression tree that we want to work with. Therefore it is now disabled. There is also a `decomposeOrConditions` method which we may need to disable later.

- RangeVariableResolver.java (ca. L160): Use `queryConditions.add` directly instead of `decomposeAndConditions`

### Indexing
Indexing in HSQLDB has been turned off.
It is unsure if it has been removed everywhere, but the changes we made to HSQLDB are:

- RangeVariableResolver.java (ca. L1420): `currentIndex` is now always null.
- RangeVariableResolver.java (ca. L1290): the method `setEqualityConditions` no longer does anything.
- RangeVariableResolver.java (ca. L190&L200): index is always 0.

### Defining distance
To find the best fixture (Database State, individual) in our genetic algorithm, we need to be able to calculate the “distance” for every fixture.
Our current implementation, where we have only looked at single table solutions, defines the distance as the minimum row distance, where row distance is the distance from a row passing the condition tree.
This basically means that we look at the best row in the table, which is closest to passing the query condition, and the distance of this row is the distance of our individual.

### Calculating distance
To calculate distance in the instrumented HSQLDB we calculate distance by asking the top Comparison for its distance.
This Comparison then asks all its children and depending on what kind of operation it is will calculate its distance.
If a Comparison results to false, it's distance will be greater than 0. The number indicates how far away it is from being true.
If a Comparison results to true, it's distance will be less than 0. The number then indicates how far away it is from being false.

- AND: Add distance of children
- OR: Take minimum of distance
- NOT: Negative distance of the only child
- OTHER: Calculate the distance between the left and right value

## Instrumenting HSQLDB
### Instrumenter
The static leading class.
This can be initialized externally and will then catch all expression Comparisons that are generated by HSQLDB and instrumented by us.
It uses the current ExpressionTree (in currentConditionNode.root) to link every Comparison to its Expression in the tree.
Gathering information (ExpressionInfo):

1. The level of the node, meaning the expression depth. This is increased every time there is an AND with a parent OR or an OR with a parent AND. If the parent is equal to the current node they are on the same expression depth.
2. The number of the node, which is the depth-first number in the expression tree.
3. The current state of negativity, currently not used (29/12/2016), but this is a Boolean that is inverted whenever a NOT node occurs. The idea behind this variable is that when an expression is negated in the final top-level expression

When it is detected that row condition has been evaluated, by the fact that the node is the top node in the current condition tree, the Comparisons are linked to their children for distance calculation later.
Both gathering information and linking comparisons is done in a depth first recursive manner, first going down the left child and doing everything that needs to be done in there, and if necessary the same on the right child.

### HSQLDB codebase
The HSQLDB source code has been altered, each alternation can be found by the preceded //TUD\_\<FirstLast\> such as //TUD\_JC.
Changes include:

- Logical AND/OR optimization
- Query parsing AND/OR optimization
- Adding Comparisons to the Instrumenter (in the Expression classes)
- Indexing turned off

1 change: 1 addition & 0 deletions evaluation/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
/target/
77 changes: 77 additions & 0 deletions evaluation/collectResults.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
import os

def readScenarioResults(scenarioFolder, scenarioName, output, coverageOutput, dataOutput, first):
resultsPath = scenarioFolder + "/" + scenarioName + "/results.csv"
coveragePath = scenarioFolder + "/" + scenarioName + "/coverageResults.csv"
dataOutputPath = scenarioFolder + "/" + scenarioName + "/dataOutput.csv"
# Only proceed if results.csv exists
if not os.path.isfile(resultsPath):
return False

# Get system and process no from scenarioName
system = scenarioName[:scenarioName.find("-")]
processNo = scenarioName[scenarioName.find("process") + 7:]

with open(resultsPath) as f:
lines = f.readlines()
header = lines[0]
lines = lines[1:]
# Print header row if needed
if first:
output.write("System|Process Number|")
output.write(header)

for line in lines:
output.write(system + "|" + processNo + "|")
output.write(line)

# Only proceed if coverageResults.csv exists
if not os.path.isfile(coveragePath):
return False

with open(coveragePath) as f:
lines = f.readlines()
header = lines[0]
lines = lines[1:]
# Print header row if needed
if first:
coverageOutput.write("System|Process Number|")
coverageOutput.write(header)

for line in lines:
coverageOutput.write(system + "|" + processNo + "|")
coverageOutput.write(line)

# Only proceed if dataoutput exists
if not os.path.isfile(dataOutputPath):
return False

with open(dataOutputPath) as f:
lines = f.readlines()
header = lines[0]
lines = lines[1:]
# Print header row if needed
if first:
dataOutput.write("System|Process Number|")
dataOutput.write(header)

for line in lines:
dataOutput.write(system + "|" + processNo + "|")
dataOutput.write(line)

return True

def collectResults(scenarioFolder, outputPrefix):
# Open output
with open(scenarioFolder + "/" + outputPrefix + "results.psv", "w") as output:
with open(scenarioFolder +"/" + outputPrefix + "coverageResults.psv", "w") as covOutput:
with open(scenarioFolder +"/" + outputPrefix + "dataOutput.psv", "w") as dataOutput:
first = True
# Loop through each process folder
for scenario in os.listdir(scenarioFolder):
if "process" in scenario:
if readScenarioResults(scenarioFolder, scenario, output, covOutput, dataOutput, first):
first = False

if __name__ == '__main__':
collectResults("scenarios", "")
Loading

0 comments on commit 1015196

Please sign in to comment.