Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Posters for IAFD #190

Open
JPH71 opened this issue Aug 3, 2022 · 19 comments
Open

Posters for IAFD #190

JPH71 opened this issue Aug 3, 2022 · 19 comments
Labels
enhancement New feature or request

Comments

@JPH71
Copy link

JPH71 commented Aug 3, 2022

IAFD has links to AEBN, GayHotMovies, GayDVDEmpire, CD Universe just like GEVI does

With this in mind I have the code put in to scrape these external sites and get any data that is missing in IAFD. especially Posters and Background art.

Unfotunately on running the asp link to point to the shop - I get a 403 Forbidden result.....
In chrome developer when I pic the link - I can see within the response header a Location entry that point to the webpage as it does in GEVi. I need to find out how to access this....

One of you helped as with the issues with GEVI, by setting up a refereal header instance some months ago... which saved my bacon in more ways than one..
Could you give some suggestions in ragard to this ---- the offending code is in utils.py - getFilmonIAFD function

Cheers

Jason xx

@JPH71
Copy link
Author

JPH71 commented Aug 3, 2022

Cosy will have the code in the nest 10 minutes....

@j-ktz
Copy link

j-ktz commented Aug 10, 2022

It looks like the blog sites (fagalicious) isn't pulling in posters now either but it could be our URL has expired.

@CodyBerenson CodyBerenson added the enhancement New feature or request label Aug 10, 2022
@CodyBerenson
Copy link
Owner

(sorry for the duplicate post)

@fivedays555:

Hope this finds you well! @JPH71 wanted to once again say THANKS! You're quick solution above is going to allow him to add an enhancement to IAFD.....for films, IAFD provides links to index sites that have the film's cover artwork, so Jason will be working on an enhancement that should allow the IAFD agent to crawl to film Film covers, since IAFD itself doesn't contain artwork other than Actor headshots.

THANKS!

@fivedays555
Copy link

Not a problem. Glad I can help. Let me know if you need any more information.

@JPH71
Copy link
Author

JPH71 commented Aug 19, 2022 via email

@fivedays555
Copy link

I tried the following attempt. Should be working:

url='https://www.iafd.com/shopclick.asp?sku=9344975'
response = get_scraper_request(url)
res = html.fromstring(response.text)
res.xpath('//*[@class="title"]')[0].text
>>> 'Fire Watch 2'

I think the direct request would fail is because the iafd using Cloudflare to block unwanted requests.

@JPH71
Copy link
Author

JPH71 commented Aug 19, 2022 via email

@fivedays555
Copy link

Should be. Otherwise, you won't be able to scrape IAFD.

@JPH71
Copy link
Author

JPH71 commented Aug 19, 2022 via email

@fivedays555
Copy link

No Problem. I will put the function call below.

import cloudscraper

scraper = cloudscraper.create_scraper()

def get_scraper_request(url, **kwargs):
    logging.info("Requesting: " + url)
    headers = kwargs.pop('headers', {})
    cookies = kwargs.pop('cookies', {})
    timeout = kwargs.pop('timeout', 30)
    proxies = {}

    global scraper

    if 'User-Agent' not in headers:
        # headers['User-Agent'] = (fake_useragent.UserAgent(fallback='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.1 Safari/605.1.15')).random
        headers['User-Agent'] = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.1 Safari/605.1.15'

    scraper.headers.update(headers)
    scraper.cookies.update(cookies)

    try:
        scraper_request = scraper.request(
            'GET', url, timeout=timeout, proxies=proxies)
    except Exception as ex:
        logging.exception('CloudScraper Failed.')

    if scraper_request and not scraper_request.ok:
        msg = ('< CloudScraper Failed Request Status Code: ' +
               str(scraper_request.status_code) + '>')
        logging.error(msg)

    return scraper_request

@JPH71
Copy link
Author

JPH71 commented Aug 19, 2022 via email

@fivedays555
Copy link

Glad I can help. Cheers!

@JPH71
Copy link
Author

JPH71 commented Aug 19, 2022 via email

@fivedays555
Copy link

Not sure what you need. But IAFD has a very sensitive request rate limit. To be safe, I put delay for each IAFD request as
time.sleep(randint(100, 200)/10)

And all IAFD requests would need the cloudscraper function.

Let me know if you need more information.

@JPH71
Copy link
Author

JPH71 commented Aug 19, 2022 via email

@fivedays555
Copy link

Oh, I did not realize it was for Adult Film Database.

I never touch or use the Adult Film Database agent, so I don't really know...

Mostly, I am using Waybig, Fagalicious Queerclick, and IAFD. They almost cover everything I need.

I took a look at Adult Film Database (https://www.adultfilmdatabase.com/), and I think there are so few gay titles there.
Why bother?

@CodyBerenson
Copy link
Owner

@JPH71 Can this be closed?

@JPH71
Copy link
Author

JPH71 commented Dec 29, 2022 via email

@JPH71
Copy link
Author

JPH71 commented Dec 29, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants