-
-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revision of AEBN scraper #1291
base: master
Are you sure you want to change the base?
Revision of AEBN scraper #1291
Conversation
Fixes the following problems: -Performer Image not scraped anymore due to minor changes in the website -Most of the metadata is not scraped anymore due to minor changes in the website (Birthdate, Country, Ethnicity, Nationality, Eye Color, Height, Weight, Fake Tits, Career Length, Twitter Instagram) -Birthdate not scraped properly in some cases -Hair Color not scraped properly in some cases -Measurements not scraped properly in some cases -Gender defaults to female now -Some cosmetic corrections to career length and details
-Added regex for removing references to further fields (Ethnicity, Eye Color, Fake Tits, Hair Color, Career Length, Aliases) -Career Length: Maps "Present" and "Current" to empty string -Country: Maps nationality to country -Career Length: Maps em dash to hyphen -Fake Tits: Maps "Enhanced" to "Fake" and "Natural" to "Natural"
- format - removed fixed Gender as not all performers where female - fixed twitter/instagram selectors - tweaked a couple of regexes
-Implemented ability to scrape movie scenes -Fixed movie performers not scraped properly -Added functionality to scrape performer tattoos, piercings and aliases
… scene scraping -Improved handling of tattoos/piercings/aliases during performer scraping -Added handling of transgender performers
thank you for this code. Been doing a bunch of AEBN scraping and didn't realize it wasn't getting all the performers. |
Yeah, I also did not realize in the beginning that not all performers are scraped. Actually I came across it when I implemented this scraper. However, in my eyes the main advantage of my scraper is that it can scrape a single scene of a movie. That's really helpful if you have movies as split scenes. Scraping the metadata of the whole movie doesn't make too much sense in this situation. AEBN is really good at providing metadata for each scene separately. I realized too late however that hotmovies is even better in this regard. So, maybe I will implement something similar for hotmovies in the future. |
Hi,
I propose this as an alternative/replacement to the existing AEBN scraper.
It has the same functionality as the existing AEBN scraper with one major additional feature:
It is now possible to scrape the metadata of a specific movie scene. This is helpful if you have a movie saved as split scenes (i.e. one file per movie scene). To scrape a single scene instead of the complete movie, do one of the following:
Enter the movie URL and add a separator + scene nr to the URL. Default separators are plus, comma and full stop but you can define your own in the header of the .py file. Example: If you want to scrape scene 2 of Kendras Obsession, enter https://straight.aebn.com/straight/movies/218523/kendras-obsession+2
Enter the name of the movie in the search field at the top of the AEBN site. You will now get a list of scenes. If you hover over the scenes, you will see the movie title and scene nr. Search for the scene you want and click on the 3 points ("Scene Information"). A popup window will appear. Now copy the link at the top of the popup window (in this example the link "Kendras Obsession, scene 2") and hand it over to the scraper.
I personally prefer option 1, but it is up to you.
This way the scraper will load only the metadata (actos, tags, etc.) of this specific scene. The scraper randomly selects one of the scene thumbnails as cover image. If the complete movie is scraped, it uses the movie cover.
Despite that I added some minor improvements:
Finally, as an "experimental" feature, I added the option to invoke the performer scraper from the scene scraper. If you set the option "scrape_performer_details" in the header of the .py file to true, the scene scraper will scrape the details for each performer of the scene. So, if you do not have this performer yet and create it from the scene scraping window the performer detail fields will be populated already. Similarly with the option "scrape_performer_images" enabled, you will have the performer image available without the need to rescrape that performer. Of course you can combine both options. However, admittedly this slows down the scraper. So, I am not sure if it is a reasonable option and I am open for discussions on that.
Cheers!