- Introduction
- Project Overview
- Requirements
- Installation
- Usage
- File Structure
- Dependencies
- Project Workflow
- Data Processing
- Data Analysis
- Output
- Contributors
- Conclusion
This documentation provides an overview of the Filter_X and Filter_XII Python scripts, which are part of a project aimed at processing and analyzing text data from student results and saving the results in an Excel file. These scripts are designed to extract, clean, and analyze data for classes X and XII, respectively.
The project consists of two Python scripts, Filter_X.py
and Filter_XII.py
, each designed to filter and process data for Class X and Class XII students, respectively. The data extraction and analysis tasks are structured into classes within these scripts, making it easy to use and maintain.
To run the Data Filter and Analysis project, the following requirements must be met:
-Python 3.x installed on the system.
-Required Python modules: re
, pandas
, openpyxl
-Required extentions: Jupyter Notebook
To install the Data Filter and Analysis project, follow these steps:
-
Ensure Python 3.10 is installed on your system.
-
Download the project files and save them to a directory of your choice.
-
Open a terminal or command prompt and navigate to the project directory.
-
Install the required Python modules by running the following command: pip install regex pandas openpyxl
-
Once the dependencies are installed, open the files in jupyter notebook.
-
After uploading the files, you can run
Main.ipynb
file.
The main functionality of the project is accessed through the Main.ipynb
Jupyter notebook. To use the project, follow these steps:
-
Import the necessary Python scripts: from Filter_X import DataProcessor_X from Filter_XII import DataProcessor_XII
-
Create instances of the
data_processor_X
anddata_processor_XII
classes, passing the path to the text files containing student data as arguments: data_processor_X = DataProcessor_X(input_file_X) data_processor_XII = DataProcessor_XII(input_file_XII) -
Use the methods provided by these classes to filter and analyze the data.
-
Save the results in an Excel file : data_processor_X.save_data_to_excel(output_file_X) data_processor_X.save_analysis_to_excel(output_file_X) data_processor_XII.save_data_to_excel(output_file_XII) data_processor_XII.save_analysis_to_excel(output_file_XII)
The project has the following file structure:
Main.ipynb
is the Jupyter Notebook file for calling and analyzing the scripts.Filter_X.py
is the main scripts for data processing for class X.Filter_XII.py
is the main scripts for data processing for class XII.- input txt file for X contains the input text file of class X.
- input txt file for XII contains the input text file of class XII.
This project depends on the following Python libraries:
re
: For regular expression-based text data extraction.
pandas
: For data manipulation and dataframe creation.
openpyxl
: For working with Excel files.
openpyxl.styles
: For styling Excel sheets.
openpyxl.utils.dataframe
: For converting dataframes to Excel sheets.
The project workflow can be summarized as follows:
-
Data extraction: Extract relevant information from the text files using regular expressions and convert it into a Pandas DataFrame.
-
Data processing: Perform data cleaning and processing, including calculating grades and percentages.
-
Data analysis: Analyze the data to generate various statistics, such as student percentage counts, subject-wise percentage counts, and highest marks.
-
Save results: Save the analyzed data in separate sheets of an Excel file.
Data processing involves several steps:
-
Data cleaning: Remove any inconsistencies or errors in the extracted data.
-
DataFrame : Creating DataFrame based on the cleaned data.
-
Calculations: Compute obtained data for various analysis.
The project performs the following data analysis tasks:
-
percentage_counts_df: Counts the number of students falling into specific percentage ranges (e.g., Above 90%, 80%-89%, etc.).
-
subject_percentage_count_df: Counts the number of students within each percentage range for each subject.
-
highest_marks_df: Identifies students with the highest marks in each subject.
The project generates an Excel file containing the following sheets:
Result_Analysis_X
: Contains the results of Class X students.Result_Analysis_XII
: Contains the results of Class XII students.
The Data Filter and Analysis project was developed by [Saurav] as a school project.
This project provides a structured and efficient way to filter and analyze student data from text files. By following the steps outlined in this documentation, users can extract valuable insights from the data and store them in an organized Excel format.