Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sourcery Starbot ⭐ refactored Liebmann5/Web_Scraper #2

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

SourceryAI
Copy link

Thanks for starring sourcery-ai/sourcery ✨ 🌟 ✨

Here's your pull request refactoring your most popular Python repo.

If you want Sourcery to refactor all your Python repos and incoming pull requests install our bot.

Review changes via command line

To manually merge these changes, make sure you're on the main branch, then run:

git fetch https://github.com/sourcery-ai-bot/Web_Scraper main
git merge --ff-only FETCH_HEAD
git reset HEAD^

Comment on lines -22 to +32
if not all(field in data for field in expected_data):
if any(field not in data for field in expected_data):
return False, 'Invalid data format'

if data['Employment Type'] not in allowed_employment_types:
return False, 'Invalid Employment Type'

if data['Experience Level'] not in allowed_experience_levels:
return False, 'Invalid Experience Level'

#TODO: Add more checks like insurance it's within users country!!!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function validate_job_data refactored with the following changes:

Comment on lines 12 to 21

# Create a signature
signature = private_key.sign(

return private_key.sign(
data,
padding.PSS(
mgf=padding.MGF1(hashes.SHA256()),
salt_length=padding.PSS.MAX_LENGTH
salt_length=padding.PSS.MAX_LENGTH,
),
hashes.SHA256()
)

# return signature.hex() #used this when I didn't have any default_backend code!!!
return signature No newline at end of file
hashes.SHA256(),
)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function sign_data refactored with the following changes:

This removes the following comments ( why? ):

# return signature.hex()    #used this when I didn't have any default_backend code!!!
# Create a signature


Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function CompanyWorkflow.company_workflow refactored with the following changes:

This removes the following comments ( why? ):

# NEW NEW NEW NEW
#TODO: refactor this!
#! FAILS: If "Internal-Job-Listings" is the initial webpage this ruins

Comment on lines -442 to +438
language_of_webpage = predictions[0][0].replace('__label__', '')
#TODO: Determine whether this should go here or somewhere else!!
# if language_of_webpage == 'en':
# return True
# else:
# return False
# = = = =
# return language_of_webpage == 'en'
return language_of_webpage
return predictions[0][0].replace('__label__', '')
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function CompanyWorkflow.check_language_of_webpage refactored with the following changes:

This removes the following comments ( why? ):

#TODO: Determine whether this should go here or somewhere else!!
# = = = =
#     return False
# if language_of_webpage == 'en':
# else:
# return language_of_webpage == 'en'
#     return True

Comment on lines -474 to +463
elements = {

return {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function CompanyWorkflow.url_parser refactored with the following changes:

Comment on lines -1371 to +1364
if arg == 'greenhouse':
print(method_name)
print(arg)
for key, value in kwargs.items():
print(key + ": " + str(value))
elif arg == 'lever':
if arg in ['greenhouse', 'lever']:
print(method_name)
print(arg)
for key, value in kwargs.items():
print(key + ": " + str(value))
print(f"{key}: {str(value)}")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function CompanyWorkflow.print_companies_internal_job_opening refactored with the following changes:

Comment on lines -1439 to +1423
if re.search(experience_needed, everything_about_job):
return False
else:
return True
return not re.search(experience_needed, everything_about_job)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function CompanyWorkflow.should_user_apply refactored with the following changes:

print("Result #" + str(self.job_links_counter) + " from Google Seaech")
print(f"Result #{self.job_links_counter} from Google Seaech")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function scraperGoogle.print_google_search_results refactored with the following changes:

print("Result #" + str(i+1) + " from Google Seaech")
print(f"Result #{str(i + 1)} from Google Seaech")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function scraperGoogle.new_print_google_search_results refactored with the following changes:

print("Result #" + str(i+1) + " from Google Seaech")
print(f"Result #{str(i + 1)} from Google Seaech")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function scraperGoogle.new_new_print_google_search_results refactored with the following changes:

Comment on lines -150 to -177
print("When you are done, type ONLY the number of your preferred web browser then press ENTER")
print(f"\t1) FireFox")
print(f"\t2) Safari")
print(f"\t3) Chrome")
print(f"\t4) Edge")
while True:
user_jobs = input()
user_jobs.strip()

if user_jobs == "1":
users_browser_choice = " FireFox "
break
elif user_jobs == "2":
users_browser_choice = " Safari "
break
elif user_jobs == "3":
users_browser_choice = " Chrome "
break
elif user_jobs == "4":
users_browser_choice = " Edge "
break
else:
print("That's kinda messed up dog... I give you an opportunity to pick and you pick nothing.")
print("You've squandered any further opportunities to decide stuff. I hope you are happy with yourself.")
print("Don't worry, the council shall discuss and provide a pick for you!")
#TODO: Make else just check OS and return number of that OS's web browser!!!
#! THIS IS A while loop.... so it runs until false
return users_browser_choice, browser_name
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function Workflow.users_browser_choice refactored with the following changes:

This removes the following comments ( why? ):

#TODO: Make else just check OS and return number of that OS's web browser!!!
#! THIS IS A while loop.... so it runs until false

Comment on lines -607 to +580
def show_warning(message, category, filename, lineno, file=None, line=None):
print(f"Warning: {message}")
def show_warning(self, category, filename, lineno, file=None, line=None):
print(f"Warning: {self}")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function Workflow.show_warning refactored with the following changes:

Comment on lines -718 to +711
cleaned_text = clean(text,
fix_unicode=True, # fix various unicode errors
to_ascii=True, # transliterate to closest ASCII representation
lower=False, # lowercase text
no_line_breaks=remove_breaks, # fully strip line breaks as opposed to only normalizing them
no_urls=True, # replace all URLs with a special token
no_emails=True, # replace all email addresses with a special token
no_phone_numbers=True, # replace all phone numbers with a special token
no_numbers=False, # replace all numbers with a special token
no_digits=False, # replace all digits with a special token
no_currency_symbols=True, # replace all currency symbols with a special token
no_punct=False, # remove punctuations
replace_with_punct="", # instead of removing punctuations you may replace them
replace_with_url="",
replace_with_email="",
replace_with_phone_number="",
replace_with_number="",
replace_with_digit="0",
replace_with_currency_symbol="",
lang="en" # set to 'de' for German special handling
)
return cleaned_text
return clean(
text,
fix_unicode=True, # fix various unicode errors
to_ascii=True, # transliterate to closest ASCII representation
lower=False, # lowercase text
no_line_breaks=remove_breaks, # fully strip line breaks as opposed to only normalizing them
no_urls=True, # replace all URLs with a special token
no_emails=True, # replace all email addresses with a special token
no_phone_numbers=True, # replace all phone numbers with a special token
no_numbers=False, # replace all numbers with a special token
no_digits=False, # replace all digits with a special token
no_currency_symbols=True, # replace all currency symbols with a special token
no_punct=False, # remove punctuations
replace_with_punct="", # instead of removing punctuations you may replace them
replace_with_url="",
replace_with_email="",
replace_with_phone_number="",
replace_with_number="",
replace_with_digit="0",
replace_with_currency_symbol="",
lang="en", # set to 'de' for German special handling
)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function Workflow.clean_gpt_out refactored with the following changes:

Comment on lines -745 to +720

device = 0 if torch.cuda.is_available() else -1
generator = pipeline("text-generation", model=model_name, device=device)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function Workflow.test_gpt_neo refactored with the following changes:

Comment on lines -216 to -220
print(f"Alright the next big setup is SpaCy!")
print("Alright the next big setup is SpaCy!")
print("\t1) en_core_web_sm => 12 MB")
print("\t2) en_core_web_md => 40 MB")
print("\t3) en_core_web_lg => 560 MB")
#If this is chosen you want to run => 'python -m spacy download en_core_web_lg'
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function UntouchedUser.set_spacy refactored with the following changes:

This removes the following comments ( why? ):

#If this is chosen you want to run => 'python -m spacy download en_core_web_lg'

Comment on lines -59 to -62
if input_data['label'] is None:
print("Dang so -> == None ...straight-up")
continue

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function process_form_inputs refactored with the following changes:

This removes the following comments ( why? ):

#self.fill_form(label, answer)
#! .get_matching_keys() does all the comaparing to get the right answer!!!!! ssooo there do   special case check -> .env chack -> long q>a ... a>a check!!!

Comment on lines -50 to -55
success = self.troubleshoot_form_filling(element, value)
if not success:
print("Failed to fill in the form. See the error messages above for details.")
else:
if success := self.troubleshoot_form_filling(element, value):
print("Successfully filled in the form.")

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function fill_that_form refactored with the following changes:

if users_app_current_version < app_current_version:
return True
return False
return users_app_current_version < app_current_version
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function check_for_update refactored with the following changes:

#Validate and add job data
result = add_job(job)
return result
return add_job(job)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function add_job_endpoint refactored with the following changes:

This removes the following comments ( why? ):

#Validate and add job data

#Validate and add user data
result = add_user(user)
return result
return add_user(user)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function add_user_endpoint refactored with the following changes:

This removes the following comments ( why? ):

#Validate and add user data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant