Skip to content

Commit

Permalink
Merge pull request #1 from ZekeriyaAY/devZ
Browse files Browse the repository at this point in the history
DevZ
  • Loading branch information
ZekeriyaAY authored Oct 30, 2022
2 parents 5de1eac + ae750ce commit c003805
Show file tree
Hide file tree
Showing 93 changed files with 10,160 additions and 130 deletions.
147 changes: 18 additions & 129 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,129 +1,18 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
.python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/
### PyCharm ###
.idea/
.idea*

### Visual Studio Code ###
.vscode/*
!.vscode/settings.json
!.vscode/tasks.json
!.vscode/launch.json
!.vscode/extensions.json
!.vscode/*.code-snippets

# Local History for Visual Studio Code
.history/
.history

# Built Visual Studio Code Extensions
*.vsix
15 changes: 14 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,14 @@
# webcrawler3003
# 🕷 webcrawler3003
Still in progress.

## 📚 Installation

```
pip3 install -r requirements.txt
```

## 🚀 To-Do
- [ ] Complete the project that can crawl a website 🤔
- [ ] GUI(PyQt)
- [ ] Multi-thread(Threading)
- [ ] Test Website
29 changes: 29 additions & 0 deletions main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
from time import sleep
import requests
from bs4 import BeautifulSoup

pages = []


def get_links(directory, url):
global pages
try:
html = requests.get(f'{url}{directory}').text
soup = BeautifulSoup(html, "html.parser")
for link in soup.find_all("a"):
if "href" in link.attrs:
if link.attrs["href"] not in pages:
new_page = link.attrs["href"]
pages.append(new_page)
get_links(new_page, url)
except:
sleep(1)


def main():
get_links("", "http://localhost:5500/")
print("\n".join(pages))


if __name__ == "__main__":
main()
7 changes: 7 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
requests==2.28.1
certifi==2022.9.24
charset-normalizer==2.1.1
urllib3==1.26.12
idna==3.4
beautifulsoup4==4.11.1
soupsieve==2.3.2.post1
60 changes: 60 additions & 0 deletions test/404.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
<!DOCTYPE html>
<html lang="en"><head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">


<title>🐧 404 Page not found - Berkay Çubuk</title>
<meta property="og:title" content="404 Page not found - Berkay Çubuk">
<meta name="twitter:title" content="404 Page not found - Berkay Çubuk">
<meta itemprop="name" content="404 Page not found - Berkay Çubuk">
<meta name="application-name" content="404 Page not found - Berkay Çubuk">
<meta name="og:site_name" content="404 Page not found - Berkay Çubuk">

<meta name="description" content="Computer Engineer, who can talk with computers &amp; websites">
<meta itemprop="description" content="Computer Engineer, who can talk with computers &amp; websites">
<meta property="og:description" content="Computer Engineer, who can talk with computers &amp; websites">
<meta name="twitter:description" content="Computer Engineer, who can talk with computers &amp; websites">

<meta name="robots" content="index,follow">
<meta name="HandheldFriendly" content="True">


<link rel="stylesheet" type="text/css" href="http://localhost:5500//css/prism.css" />
<link rel="stylesheet" type="text/css" href="http://localhost:5500//css/main.css" />

</head>
<body>

<header>
<div class="title">Berkay Çubuk</div>
<nav>
<a href="/">Home</a>
<a href="/blog">Blog</a>
<a href="/about">About</a>
<a href="/contact">Contact</a>
</nav>
</header>



<main>
<h1>404 Not Found</h1>

</main>


<footer>
<p class="text-center"><span class="comment">// Last update: Oct 30, 2022 20:27 (&#43;03)</span></p>
<div class="d-flex justify-content-center">
<a href="/privacy">Privacy</a>
</div>
</footer>





<script src="http://localhost:5500/js/prism.js"></script>
</body>
</html>
72 changes: 72 additions & 0 deletions test/about/index.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
<!DOCTYPE html>
<html lang="en"><head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">


<title>🐧 About - Berkay Çubuk</title>
<meta property="og:title" content="About - Berkay Çubuk">
<meta name="twitter:title" content="About - Berkay Çubuk">
<meta itemprop="name" content="About - Berkay Çubuk">
<meta name="application-name" content="About - Berkay Çubuk">
<meta name="og:site_name" content="About - Berkay Çubuk">

<meta name="description" content="About me">
<meta itemprop="description" content="About me">
<meta property="og:description" content="About me">
<meta name="twitter:description" content="About me">

<meta name="robots" content="index,follow">
<meta name="HandheldFriendly" content="True">


<link rel="stylesheet" type="text/css" href="http://localhost:5500//css/prism.css" />
<link rel="stylesheet" type="text/css" href="http://localhost:5500//css/main.css" />

</head>
<body>

<header>
<div class="title">Berkay Çubuk</div>
<nav>
<a href="/">Home</a>
<a href="/blog">Blog</a>
<a href="/about">About</a>
<a href="/contact">Contact</a>
</nav>
</header>



<main>
<section>
<h1>About</h1>


</section>

<section class="post-content">
<p>My name is Berkay Çubuk. I program and develop things that works on servers, websites and even your computer. Also, I live and work in Bursa / Turkey.</p>
<p>Currently I study at the <a href="https://uludag.edu.tr">Bursa Uludağ University</a> and I maintain <a href="https://github.com/berkaycubuk">open-source</a> projects and solve problems.</p>
<p>I developed e-commerce and landing page sites for my clients, also experimented with social media websites and mobile apps. Now looking forward to work with other great projects and experiment with other
languages.</p>

</section>

</main>


<footer>
<p class="text-center"><span class="comment">// Last update: Oct 30, 2022 20:27 (&#43;03)</span></p>
<div class="d-flex justify-content-center">
<a href="/privacy">Privacy</a>
</div>
</footer>





<script src="http://localhost:5500/js/prism.js"></script>
</body>
</html>
Loading

0 comments on commit c003805

Please sign in to comment.