Skip to content

Latest commit

 

History

History
130 lines (112 loc) · 4.68 KB

README.md

File metadata and controls

130 lines (112 loc) · 4.68 KB

Discogs Plus API

This project is designed to extend the filtering capabilities of Discogs using AWS services. It provides a REST API for querying a dataset of electronic music releases enriched with additional metadata such as wants, haves, ratings, pricing data, and more. The idea here is that you will be able to use custom filters like: fetch me records that have between 100 and 200 wants, 50-100 have, and the label only has one release. Note: the enriched data was scraped fairly recently but will not be consistently updated, so some of release statistics might be slightly different then what you currently see on Discogs.

Architecture Overview

AWS Services Used

  • Amazon S3: Stores the partitioned dataset in Parquet.
  • AWS Athena: Executes SQL queries on the S3 dataset.
  • AWS Lambda: Handles API requests and queries Athena.
  • Amazon API Gateway: Exposes the API endpoints for users.
  • AWS CloudWatch: Logs API calls and Lambda executions.

Project Structure

  • src/ - Contains the source code for the project
    • lambda_function/ - Lambda-related code
    • sql/ - SQL scripts for Athena and other queries
  • tests/ - Unit tests

Endpoint & Request Parameters

Base URL

https://n0bel2lf6a.execute-api.us-east-1.amazonaws.com/prod/

Endpoint

GET /discogs_plus

Parameters

The release_year filter is required, while all other parameters are optional. The API dynamically constructs the query based on the provided filters.

Numeric Filters

Each numeric field can be filtered using:

  • greater_than → Filters for values greater than a number.
  • less_than → Filters for values less than a number.
  • between → Filters for values between two numbers (comma-separated).

Available Filters

  • want_to_have_ratio – Ratio of wants to haves.
  • have – Number of users who own the release.
  • want – Number of users who want the release.
  • avg_rating – Average rating of the release (1-5).
  • median_price – Median sale price of the release.
  • ratings – Number of ratings for the release.
  • video_count – Number of videos linked to the release.
  • n_styles – Number of styles associated with the release.
  • release_year – Release year range (required).
  • n_releases_on_label – Number of releases by the label.
  • country – Country of release (comma-separated).
  • styles – Music styles filter.
    • styles_exact – Filters for an exact match of all specified styles.
    • styles_contains – Filters for records that include the specified style(s), but may also have others.
  • limit - Number of records to be returned. Please note the limit is capped at 1000 records per request at the moment.

Examples

Example API Call:

import requests

url = "https://n0bel2lf6a.execute-api.us-east-1.amazonaws.com/prod/discogs_plus"

params = {
    "want_greater_than": "150",
    "have_between":"0,200",
    "release_year_between": "1990,2006",
    "video_count_less_than":"3",
    "styles_exact":"Electro",
    "country": "US,UK",
    "limit":"500"
}

response = requests.get(url, params=params)

print(response.json())  

Example API Call (Pagination):

import requests
url = "https://n0bel2lf6a.execute-api.us-east-1.amazonaws.com/prod/discogs_plus"
params = {
    "release_year_between": "1992,2006",
    "styles_exact": "Electro",
    "n_records": "1000",
    "country": "US,UK",
}
data = []
counter = 1
while True:
    response = requests.get(invoke_url, params=params)
    if response.status_code != 200:
        print(f"Error fetching data: {response.status_code}")
        break
    response_json = response.json()
    params["query_execution_id"] = response_json.get("query_execution_id")
    params["next_token"] = response_json.get("next_token")
    data.extend(response_json.get("data"))
    print(f"Batch {counter} fetched {len(response_json.get('data'))}")
    if response_json.get("next_token") is None:
        break
    counter += 1
print(data)

Example Response:

[{'release_id': '1064730',
  'release_url': 'https://www.discogs.com/release/1064730',
  'release_title': 'Dance',
  'artist': 'The Action Pack',
  'label_name': 'D-Bass Records',
  'catno': 'DBR-50101'},
 {'release_id': '1382356',
  'release_url': 'https://www.discogs.com/release/1382356',
  'release_title': 'Technology',
  'artist': 'Aux 88',
  'label_name': 'Direct Beat',
  'catno': 'DB4W-002'},
 {'release_id': '428814',
  'release_url': 'https://www.discogs.com/release/428814',
  'release_title': 'Untitled',
  'artist': 'Industrial Bass Machine/ Ash Rock',
  'label_name': 'Cyberian Knights Records',
  'catno': 'CKR008'}
]