CodeIt is a high-performance search engine built using Node.js, Express, and EJS. It is designed to help users find solutions to coding problems from popular platforms such as LeetCode, Codeforces, and InterviewBit. The search engine incorporates advanced natural language processing (NLP) techniques to ensure relevant results and provides a user-friendly interface for query submission.
- Search Functionality: Allows users to search for coding problems based on their query.
- Stopword Removal: Removes common stopwords (like "is", "the") from search queries.
- Punctuation Removal: Automatically removes punctuation from user input to enhance search accuracy.
- Spell Correction: Implements spell-check and correction to handle typographical errors in queries.
- Lemmatization: Reduces words to their base form (e.g., "running" becomes "run").
- Number to Word Conversion: Converts numeric values in queries to their word equivalents (e.g., "123" becomes "one hundred twenty-three").
- BM25 Algorithm: Calculates relevance of documents using the BM25 ranking function.
- Title Similarity: Enhances search ranking based on string similarity between the search query and document titles.
- Document Storage: Stores coding problems from different platforms for efficient search.
- Node.js: Backend server environment.
- Express.js: Web framework for handling routes and requests.
- EJS (Embedded JavaScript): Template engine for rendering dynamic web pages.
- Selenium: Used for web scraping problems from coding platforms.
- Natural Language Processing (NLP): For processing and optimizing search queries.
CODE-IT
│ ├── node_modules/
│ ├── Problems/ # Directory containing scraped coding problems.
│ ├── public/ # Public assets (CSS, JS, images).
│ ├── views/ # EJS view templates.
│ ├── idf.js # IDF values for BM25.
│ ├── keywords.js # Keyword processing module.
│ ├── length.js # Length of documents.
│ ├── magnitude.js # Magnitude of TF-IDF vectors.
│ ├── tf-gen.js # Generates term frequency values.
│ ├── index.js # Main entry point.
│ ├── package.json # Dependencies and project info.
│ ├── urls.js # URLs for problems from coding platforms.
│ └── README.md # Project documentation (this file).
-
Clone the repository:
git clone https://github.com/HrudayGurijala/Code-It-main.git
-
Navigate to the project directory:
cd Code-It-main
-
Install the required dependencies:
npm install
-
Ensure you have the problem dataset (scraped coding problems) in the
Problems/
directory.
-
Start the server:
npm start
-
Open your browser and navigate to
http://localhost:3000
to access the search engine.
- Query Processing: The search query is split into individual words, and stopwords and punctuation are removed.
- Keyword Matching: The search engine matches the processed keywords against a list of coding problems.
- BM25 Ranking: Documents are ranked based on the BM25 algorithm, which uses term frequency (TF) and inverse document frequency (IDF) to calculate relevance.
- Result Display: The top 10 results are displayed, sorted by relevance score. Users can click on any result to view the full problem description.
The BM25 algorithm calculates the relevance of a document D
to a query Q
using the following formula:
BM25(D, Q) = sum_over_terms((TF * (k1 + 1)) / (TF + k1 * (1 - b + b * (doc_len / avg_doc_len))) * IDF)
Where:
TF
is the term frequency in the document.IDF
is the inverse document frequency.doc_len
is the length of the document.avg_doc_len
is the average document length.k1
andb
are parameters set to control term frequency saturation and document length normalization.
If you'd like to contribute to this project, feel free to submit a pull request or open an issue on the GitHub repository.
This project is licensed under the MIT License - see the [LICENSE] file for details.
If you have any questions, feel free to reach out:
- Email: [email protected]