Code for paper A Neural-Network based Code Summaization Approach by Using Source Code and its Call Dependencies
This tool is used to extract <code, comment, call dependency sequence> data from java projects.
To use this tool, run "java -jar jtags.jar --project-path --output-path"
Project-path is the project you want to analyse, and output-path is the extracted data to output to.
The result of this tool is 4 files, named as "code.data, comment.data, seq.data, tuple.json". Every line in the first three files has the format "id\tdata", and the ids of these files are related.
The "tuple.json" file is not used in our experiment. It save the source code and the entire related codes it called.
The data should be cleaned before training. We use python to do data clean process. To get the training data, do follows:
- put code.data seq.data and comment.data in the same folder of python source code.
- run callnn.py to get formated seq data, its name is formatseq.data
- remove the original "seq.data" and rename the "formatseq.data" to "seq.data", then run dataprocess.py
Our data can be found in the folder "call".
the model is based on https://github.com/eske/seq2seq and https://github.com/xing-hu/TL-CodeSum
-
put the prepared data into right position
-
The configuration of different models that we used is available in the folder "config", we use "call.yaml" in our experiment.
-
run "python3 main.py ../config/**.yaml --train -v" to train the model.