This project is designed to extend the filtering capabilities of Discogs using AWS services. It provides a REST API for querying a dataset of electronic music releases enriched with additional metadata such as wants, haves, ratings, pricing data, and more. The idea here is that you will be able to use custom filters like: fetch me records that have between 100 and 200 wants, 50-100 have, and the label only has one release. Note: the enriched data was scraped fairly recently but will not be consistently updated, so some of release statistics might be slightly different then what you currently see on Discogs.
- Amazon S3: Stores the partitioned dataset in Parquet.
- AWS Athena: Executes SQL queries on the S3 dataset.
- AWS Lambda: Handles API requests and queries Athena.
- Amazon API Gateway: Exposes the API endpoints for users.
- AWS CloudWatch: Logs API calls and Lambda executions.
src/
- Contains the source code for the projectlambda_function/
- Lambda-related codesql/
- SQL scripts for Athena and other queries
tests/
- Unit tests
https://n0bel2lf6a.execute-api.us-east-1.amazonaws.com/prod/
GET /discogs_plus
The release_year
filter is required, while all other parameters are optional. The API dynamically constructs the query based on the provided filters.
Each numeric field can be filtered using:
greater_than
→ Filters for values greater than a number.less_than
→ Filters for values less than a number.between
→ Filters for values between two numbers (comma-separated).
want_to_have_ratio
– Ratio of wants to haves.have
– Number of users who own the release.want
– Number of users who want the release.avg_rating
– Average rating of the release (1-5).median_price
– Median sale price of the release.ratings
– Number of ratings for the release.video_count
– Number of videos linked to the release.n_styles
– Number of styles associated with the release.release_year
– Release year range (required).n_releases_on_label
– Number of releases by the label.country
– Country of release (comma-separated).styles
– Music styles filter.styles_exact
– Filters for an exact match of all specified styles.styles_contains
– Filters for records that include the specified style(s), but may also have others.
limit
- Number of records to be returned. Please note the limit is capped at 1000 records per request at the moment.
import requests
url = "https://n0bel2lf6a.execute-api.us-east-1.amazonaws.com/prod/discogs_plus"
params = {
"want_greater_than": "150",
"have_between":"0,200",
"release_year_between": "1990,2006",
"video_count_less_than":"3",
"styles_exact":"Electro",
"country": "US,UK",
"limit":"500"
}
response = requests.get(url, params=params)
print(response.json())
import requests
url = "https://n0bel2lf6a.execute-api.us-east-1.amazonaws.com/prod/discogs_plus"
params = {
"release_year_between": "1992,2006",
"styles_exact": "Electro",
"n_records": "1000",
"country": "US,UK",
}
data = []
counter = 1
while True:
response = requests.get(invoke_url, params=params)
if response.status_code != 200:
print(f"Error fetching data: {response.status_code}")
break
response_json = response.json()
params["query_execution_id"] = response_json.get("query_execution_id")
params["next_token"] = response_json.get("next_token")
data.extend(response_json.get("data"))
print(f"Batch {counter} fetched {len(response_json.get('data'))}")
if response_json.get("next_token") is None:
break
counter += 1
print(data)
[{'release_id': '1064730',
'release_url': 'https://www.discogs.com/release/1064730',
'release_title': 'Dance',
'artist': 'The Action Pack',
'label_name': 'D-Bass Records',
'catno': 'DBR-50101'},
{'release_id': '1382356',
'release_url': 'https://www.discogs.com/release/1382356',
'release_title': 'Technology',
'artist': 'Aux 88',
'label_name': 'Direct Beat',
'catno': 'DB4W-002'},
{'release_id': '428814',
'release_url': 'https://www.discogs.com/release/428814',
'release_title': 'Untitled',
'artist': 'Industrial Bass Machine/ Ash Rock',
'label_name': 'Cyberian Knights Records',
'catno': 'CKR008'}
]