Skip to content

Latest commit

 

History

History
79 lines (69 loc) · 2.24 KB

todo.md

File metadata and controls

79 lines (69 loc) · 2.24 KB

Todo's

Refactoring

Training sets

  • Generation of predicates
    • Use respective B model classes of ProB2
    • subsitute definitions
    • Generation of predicate lists
  • Generation of training set
    • From directory of machine files
    • From predicate lists
    • Training data formats
      • Loading of samples from the format
    • Create DB
    • Create features and labels
    • Skip already existing data
      • naively by timestamp
      • optional by versioned generation step
    • Generation statistics
      • Return statistics after creation of data
    • Wrap StateSpaces
    • CLI Option to create data
    • Incorporate BPredicate/BElement classes more
  • Training set manipulation
    • Split training set
    • upsample training set
    • downsample training set
    • shuffle training set
    • shuffling of big data sets that do not fit totally into memory
  • Training set analysis
    • Classification analysis tool
    • Regression analysis tool
    • PredicateDb analysis
  • Data Base translation
    • Migrate from old pdump to new JSON
    • Translate Db to Training Format

Machine Learning algorithms

  • Training of neural networks

Other

  • Enhance versioning of Backends

Planned features

  • Appending to a format
  • enhanced documentation
    • documentation of JSON entries, (legacy) Predicate dumps, other formats
    • usage examples
      • Training data generation
      • Training data migration
      • Training/using neural networks
  • RNN support
    • set training set structure
      • set/implement appropriate RecordReader
    • create RNNTrainingDataGenerator(s)
    • create RNN features
      • raw predicate features
      • predicate AST features
  • Data augmentation utilities
    • PCA: whitening
    • add type information to identifiers
      • hungarian notation: "x + y" -> "xInt + yInt"
      • joshua notation: "x + y" -> "i1 + i2"
    • Normalisation of predicates
  • Enhanced analysis of feature sets
    • feature dimensionality reduction
      • PCA
      • RBM
      • by decision trees
    • t-SNE
  • Decision trees
    • Random forests
    • Deep Forest