Set environment variale
Mac
export MYVARIABLE=123
echo $MYVARIABLE
curl http://example.com
curl -X POST "http://example.com/api/endpointname" -H "accept": "application/json" -H "Content-Type: application/json" -d '{"parameter1": "Content of parameter1", "parameter2": "Content of parameter2"}'
Mac
sudo lsof -i :8000
sudo kill -9 68102
Mac
mkdir myfolder
touch myfile.txt
rm -r myfolder myfile.txt
Mac
vim myfile.txt
nano myfile.txt
Tips:
Vue: When creating a new Git repo for a Vue project not forgot to push also hidden files (to view them press "Comman+Shift+."). Otherwise git clone & npm install will fail when setting up the project.
Cloudflare deployment:
- Just pushing updates to Git will trigger automatically the cloudbuild in Cloudflare. Expect logs in left menu tab under Workers&Pages to see if the deployment progress fails. Copy the new url endpoint (i.e. "....pages.dev").
- Click in the left menu tab "Websites" (first tab) and choose website (middle). Click in left menu tab "DNS". Replace the new url endpoint in the DNS configuration file.
- T4 -> 16 GB vram
- V100 -> 16 GB vram
- L2, L4 -> 24 GB vram
- RTX Quadro 8000 -> (24, 48) GB vram
- A100, H100 -> (40, 80) GB vram
git --help
git branch "…"
git checkout -b "…"
git status
git add *
git commit -m "First commit"
git tag -a v0.1.0 -m "v0.1.0" or git tag -a v0.1.1 -m "v0.1.1-new-feature-description"
git push --follow-tags
git push
git pull
git reset --hard HEAD
git switch "Name of branch"
git checkout "Name of branch or document/script"
git log
git log --oneline
git log --pretty=format:"%h - %an, %ar : %s"
git log --graph --decorate --oneline
git stash
git stash pop
git stash apply (stash stays accessable accross branches)
git stash save "Add a description here"
git stash list
git stash apply stash@{0}
Assuming you are currently in the main branch and you want to merge the feature branch into your main branch
git merge feature_branch
git pull feature_branch
Local branch (-d safe delete, -D force delete)
git branch -d feature-branch
git branch -D feature-branch
Remote branch
git push origin --delete feature-branch
git clone name_of_repository
"""
Description:
----
Description of function. This is '''bold'''. This is ''italic''.
Args:
----
parameter1 (list): A list of data.
parameter2 (str): A string describing the data.
Returns:
----
variable1 (list): Edited list.
Raises:
----
KeyError: Raises an exception.
"""
pip install --upgrade pip
pip install virtualenv
Mac
python3.11 -m venv .venv
source .venv/bin/activate
deactivate
pip install -r requirements.txt
Windows
py -3.11 -m venv .venv
.venv\Scripts\activate.bat
.venv\Scripts\deactivate.bat
pip install -r requirements.txt
...
pip install ruff
ruff format .
ruff check . --fix --select I
import argparse
parser = argparse.ArgumentParser(description="Some description")
parser.add_argument('-d','--dir',type=str,default='training2017',help='The directory of the dataset')
parser.add_argument('-t','--test_set',type=float,default=0.2,help='The percentage of test set')
args = parser.parse_args()
my_function(args.dir, test=args.test_set)
import warnings
warnings.filterwarnings("ignore")
import logging
logging.basicConfig(filename='app.log', level=logging.DEBUG, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger_module1 = logging.getLogger('module1')
logger_module1.setLevel(logging.INFO)
logger_module2 = logging.getLogger('module2')
logger_module2.setLevel(logging.WARNING)
logger_module1.error("An error occurred: %s", str(e), exc_info=True)
logger_module2.error("An error occurred: %s", str(e), exc_info=True)
try:
assert …
except Exception as e:
raise(e)
For more, see: https://docs.python.org/3/library/exceptions.html
pip install pytest
All tests are within /tests directory
conftest.py defines @fixtures
assert x
Imports within submodules always are resolved from the working direction of the executed main script. Submodules only know files within the working direction of the main executed script or within the same folder. To extend the path knowledge from parent folders of the submodule, do the following:
pip install path
import path
import sys
directory = path.Path(__file__).abspath()
sys.path.append(directory.parent.parent)
import ..
directory.parent -> folder of current file
directory.parent.parent -> parent folder
More: https://stackoverflow.com/questions/30669474/beyond-top-level-package-error-in-relative-import
import sys
sys.path.append("..")
File opening works different than imports, since the relative file reading path starts only from the working direction of the project .venv folder. Also, do not add "/" at the beginning of the relative path, otherwise the path will not get resolved. A better way to open files is by (also here do not use "/" to join the second path):
import os
current_path = os.getcwd() # or os.path.dirname(os.path.abspath(__file__))
file_name = „src/folderX/file.txt“
full_file_path = os.path.join(current_path, file_name)
full_file_path = os.path.join(os.getcwd(), file_name)
import os
print(os.getcwd()) # Working direction (project .venv folder)
print(os.path.dirname(os.path.abspath(__file__))) # Working direction (current script folder)
import os
if os.path.exists(file_path):
os.remove(file_path)
f = open("file.txt", "r", encoding="utf-8")
data = f.readlines()
f.close()
f = open("file.txt", "w", encoding="utf-8")
f.writelines(data)
# f.write(data)
f.close()
with open('file.txt', 'w', encoding='utf-8') as file:
file.writelines(html_content)
import os
directory = 'src/dataset‘
for dirpath, dirnames, filenames in os.walk(directory):
for filename in filenames:
print(os.path.join(dirpath, filename))
import json
with open(file_path, 'r') as file:
data = json.load(file)
import json
data = json.loads(str)
with open('json_file.json', 'w') as file:
json.dump(json_list, file, indent=2)
lst = [x+x for x in lst]
data = list(map(lambda x: x.replace("\n", ""), data))
joined_lst = '; '.join(lst)
data = [list(row) for row in zip(*data)]
set = set(lst1).intersection(lst2)
if set:
…
combined = list(zip(lst1, lst2))
combined.sort(key=lambda x: x[1])
lst1, lst2 = zip(*combined)
lst1 = list(lst1)
lst2 = list(lst2)
import random
combined = list(zip(X, Y, Z))
random.shuffle(combined)
X, Y, Z = zip(*combined)
X = list(X)
Y = list(Y)
Z = list(Z)
from collections import Counter
count = Counter(lst)
print(count)
Ist = list(set(lst))
from collections import Counter
count = Counter(Ist)
doubles_count = {item: c for item, c in count.items() if c > 1}
doubles = [item for item in list1 if item in list2]
reversed_dict = dict(map(reversed, original_dict.items()))
import queue
q = queue.Queue()
q.qsize()
q.put()
q.get()
df = pd.read_csv(file_path, delimiter=";")
df.dropna(inplace=True)
for index, row in df.iterrows():
column1_value = row["Column1“]
column2_value = row["Column2“]
for column in df.columns:
print(column, df[column].head())
for column_name, data in df.items():
print(column_name, data.head()) # Example operation
third_row_value = df.iloc[2, 1]
import time
time.sleep(2)
from datetime import datetime
start=datetime.now()
print(f"Duration: {datetime.now()-start}")
from tqdm import tqdm
for i in tqdm(lst):
...
for index, row in enumerate(tqdm()): # (for enumerate)
...
pbar = tqdm(total=total_number, desc="Load data", position=0, leave=True)
loop:
pbar.update(1)
Count CPUs
import os
num_cores = os.cpu_count()
print("Number of CPUs:", num_cores)
import multiprocessing
from tqdm import tqdm
def extract_features(input_list, list_index, new_shared_list, lock):
for x in tqdm(input_list, desc=f“Process {list_index}", position=list_index, leave=False):
try:
new_value … do something …
with lock:
new_shared_list.extend(new_value)
except Exception as e:
pass
manager = multiprocessing.Manager()
all_features = manager.list()
lock = manager.Lock()
num_processes = 4 # multiprocessing.cpu_count()
processes = []
for sublist_index in range(num_processes):
p = multiprocessing.Process(target=extract_features, args=(lst[list_index], list_index, new_lst, lock))
processes.append(p)
p.start()
for process in processes:
process.join()
Without new line
print("Hello,", end=" ")
print("world!")
new_text = re.sub(" +", " ", text)
new_text = re.sub("[^a-zA-ZäöüÄÖÜß ]", "", text)
- Avoid getting blocked -> download pages
- Lazy loading -> scroll to bottom
- Iframes -> driver.switch_to.frame("iframeClassOrID")
- https://medium.com/@pankaj_pandey/web-scraping-using-python-for-dynamic-web-pages-and-unveiling-hidden-insights-8dbc7da6dd26
- https://pypi.org/project/selenium-stealth/
pip install selenium
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument("--headless")
driver = webdriver.Chrome(options=options)
page = ""
driver.get(page)
html_content = driver.page_source
with open("src/data/example.html", "w", encoding="utf-8") as file:
file.write(html_content)
driver.get("file:/Users/kiliankramer/Desktop/example.html")
driver.close() # closes current window
driver.quit() # shuts down driver
If driver.find_element("XPATH", "//*[@id="L2AGLb"]"):
input("CAPTCHA…")
driver.find_element("path", "//*[@id="L2AGLb"]").click()
from selenium.webdriver.common.by import By
driver.find_element(By.XPATH, "//div[@class='link-text padding-s ng-binding ng-scope']") # <- best
driver.find_element("id", "anyIDname")
driver.find_element(By.ID, "anyIDname")
driver.find_element(By.CLASS_NAME, "anyCLASSname")
(By.tagname, "p")
element.send_keys("input")
element.click()
inner_html = element.get_attribute("innerHTML") # string
outer_html = element.get_attribute("outerHTML") # string
inner_html = element.text
id = element.get_attribute("id")
from selenium.webdriver.common.by import By
element = driver.find_element() and element.find_element() is also possible:
all_elements = driver.find_elements(By.XPATH, "//*")
parent_element = element.find_element(By.XPATH, "..")
child_element = element.find_element(By.XPATH, "./*") # Finds first child element -> find_elements() would fine all child elements of a specific HTML tag
ancestor_element = element.find_element(By.XPATH, "./preceding::*[1]")
successor_element = element.find_element(By.XPATH, "./following::*[1]")
Node Version Manager
brew install nvm
source $(brew --prefix nvm)/nvm.sh
nvm install 18
nvm install 21
nvm list
nvm use 18
nvm version
brew services list
brew services start mongodb/brew/[email protected]
brew services stop mongodb/brew/[email protected]
mongosh
mongosh "mongodb://localhost:27017"
quit
show dbs
use tutorial
db.dropDatabase('tutorial')
db.createCollection('products')
show collections
db.dropCollection('products')
https://artificialanalysis.ai
https://chat.lmsys.org/?leaderboard
https://huggingface.co/spaces/mteb/leaderboard
https://huggingface.co/blog/Pclanglais/common-corpus
https://huggingface.co/datasets
https://www.kaggle.com/datasets
https://commoncrawl.org
https://dumps.wikimedia.org
https://vast.ai
https://groq.com
https://ollama.com
https://github.com/settings/personal-access-tokens
see: https://docs.github.com/en/[email protected]/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens#:~:text=creating%20GitHub%20Apps.%22-,Creating%20a%20personal%20access%20token,Click%20Generate%20new%20token.