Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code or tool for pre-processing source code #2

Open
betterenvi opened this issue Nov 19, 2018 · 20 comments
Open

Code or tool for pre-processing source code #2

betterenvi opened this issue Nov 19, 2018 · 20 comments

Comments

@betterenvi
Copy link

Can you also provide the code or tool for pre-processing source code?
(parsing source code, and extracting api sequences etc.)
Thanks!

@ttbuffey
Copy link

Dear Author,
We have evaluated the deep code search model in your paper with our own search codebase, we want to improve the current search performance and start from improving the API sequence extraction because we find some gap by comparing with yours.

We now parse the AST for each extracted function and then extract the API sequence, in which way the dependencies to other classes are missing. We find in the deep API paper chapter 4.1.1 , it's based on the whole project and use an extraction algorithm to analyze the dependencies of the whole project. Could you please share with us the code for the extraction algorithm and API sequence extraction or give us some clue?

Great thanks to you.

@guxd
Copy link
Owner

guxd commented Aug 19, 2019

@ttbuffey please provide your email address.

@ttbuffey
Copy link

ttbuffey commented Aug 19, 2019

@guxd [email protected]

@ttbuffey
Copy link

@guxd I want to confirm that in the Code2APIseq.zip project, the entry file is Code2APIseq.java, this project takes a function body as input and return its API sequences, right?

It doesn't contain the code of analyzing the dependencies of the whole project, right?

@guxd
Copy link
Owner

guxd commented Aug 20, 2019

yes, that part is omitted.

@ttbuffey
Copy link

@guxd thanks again for sharing the project to me. I think the project dependencies is a key part in API sequence extraction, right? Could you please also help to share with me that part of code?

We really come to bottleneck in improving the search performance, seeing from the code you shared with me, it's really difficult and takes tremendous time for me to implement it, especially when i'm not good at java language.

I have implemented the API extraction based on each function in python language using the javalang library(https://github.com/c2nes/javalang) you shared with us before, but the javalang library doesn't support project dependencies analysis.

Sincerely hope you can help again.

@guxd
Copy link
Owner

guxd commented Aug 20, 2019

The code for dependency analysis is included in the package. (see Line 73 at ObjSeqBuilder.java). We just modified the main function.

@ttbuffey
Copy link

thanks for your confirmation

@ttbuffey
Copy link

ttbuffey commented Aug 22, 2019

@guxd I have read the code, I have two questions addressed as below. It will be appreciated if you could give me some guidance.

  • I wonder how the whole project dependencies analysis is implemented, what's the main function looks like. Could you please describe the main steps with me. As I checked the line 73 at ObjSeqBuilder.java, per my understanding it's parsing the projects dependencies, but i don't know how to pass the whole project information to this function?

  • After the project dependencies are analyzed, I find the dependencies are kept in the callGraph and Datagraph, but how it is applied to the AST parsing for a single function?

  • regarding the current main function, I changed the function body defined by parameter code as our own code, the result is empty. By checking the code related to JDKAPI.java, it will filter out the calls not listed in this file. Can you explain what's the purpose of this file?

Thanks very much. I have tried to understand the code with my limited knowledge.🤦‍♀️

@ttbuffey
Copy link

@guxd
Dear author, could you please help to give me some clue for the above questions.
Sincerely wish you the best.

@ttbuffey
Copy link

@guxd 可以简单说一下main函数中解析整个项目依赖的关键过程吗?project信息是以什么方式传送到整个代码的实现中呢?

@JiyangZhang
Copy link

@guxd
Dear author, could you please help to give me some clue for the above questions.
Sincerely wish you the best.

Hello, did you get the point on how to do the dependency analysis?

@guxd
Copy link
Owner

guxd commented Jan 3, 2020

@JiyangZhang We follow the GrouMiner and their code for the project dependency analysis.
The code I provided to @ttbuffey was a simplified version of GrouMiner for a quick demo.

@JiyangZhang
Copy link

@JiyangZhang We follow the GrouMiner and their code for the project dependency analysis.
The code I provided to @ttbuffey was a simplified version of GrouMiner for a quick demo.

Thank you very much for reply!
The link you give seems not work.
Are all the code for extracting API sequence in the Code2APISeq.zip in the google drive link:?https://drive.google.com/drive/folders/1jBKMWZr5ZEyLaLgH34M7AjJ2v52Cq5vv

@guxd
Copy link
Owner

guxd commented Jan 3, 2020

The link works in my network. Yes, it contains all code for extracting API sequence.

@JiyangZhang
Copy link

The link works in my network. Yes, it contains all code for extracting API sequence.

great thanks.
But I am not sure I use it in the correct way. I came across the same problem as @ttbuffey
I tried to substitute the code with the example in Fig4 in your paper, but got 'Empty' as the result. I have no idea about the reason. Thanks
Sorry for disturbing you.

@guxd
Copy link
Owner

guxd commented Jan 3, 2020

Could you check whether the APIs in the example is included in the JDKAPI.java?
@ttbuffey This file is used to filter out non-JDK APIs.
The 'Empty' result could be due to the filtering process.
Besides, DeepAPI used a more complicated version of GrouMiner rather than the provided demo extractor.

@JiyangZhang
Copy link

Could you check whether the APIs in the example is included in the JDKAPI.java?
@ttbuffey This file is used to filter out non-JDK APIs.
The 'Empty' result could be due to the filtering process.
Besides, DeepAPI used a more complicated version of GrouMiner rather than the provided demo extractor.

Hi, could you share the code you used to extract the api sequence?
I am working on a project to create a dataset of methods' api sequences.

@guxd
Copy link
Owner

guxd commented Jan 8, 2020

The raw code is not at hand now. The demo code is very close to the code that DeepAPI used.

@Moshiii
Copy link

Moshiii commented Oct 12, 2020

Hello!
I am a researcher as well. Can you please send me a copy of the Code2API.zip as well? I need to generate API from new training samples. my email is [email protected] Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants