- Updated parser to work correctly with FurAffinity's new tag-blocking feature on submission pages
- Fix next page of favorites
- The selector path to the "Next" button had changed
- The presence of the button is now tested
- Fix a possible issue were cookie value could be set as
None
when using ahttp.cookiejar.CookieJar
object
- Use requests ^2.32.3
- Use beautifulsoup4 ^4.12.3
- Use lxml ^5.3.0
- Use python-dateutil ^2.9.0
- Fix square brackets [] being removed from usernames
- Fix recursion limit error with chains of journal comments longer than ~1/6 of the recursion limit
- HTML content is not minified beyond basic stripping of whitespace characters
- Fix recursion limit error with chains of submission comments longer than ~1/6 of the recursion limit
- Fix incorrectly parsed usernames in comments with the OP (Original Poster) tag
- Use lxml ^4.9.3
- Remove htmlmin
- Support submissions with no or partial category
- Session class for requests can be customized with new
session_class
argument forFAAPI
- Remove cfscrape dependency
- Was not updated in years and all requests succeeded with a normal
requests.Session
object
- Was not updated in years and all requests succeeded with a normal
- Use requests ^2.31.0
- Fix CVE-2023-32681 issue
- Use beautifulsoup4 ^4.12.2
- Use lxml ^4.9.2
- Fix parsed URLs not being properly encoded if they contained non-allowed URL characters
- Fur Affinity UI update
- Support the new UI introduced on November 26, 2022
- Note: the new UI does not show comment parents yet, but the parent comment link is still present in the HTML and just commented out, so the parser uses regex to extract the parent ID; this could cause unforeseen issues so be careful when parsing comments
- User banners
- Parse new user banners (when set)
- New
User.banner_url
variable holds the banner URL
- Rename
User.user_icon_url
andUserPartial.user_icon_url
toUser.avatar_url
andUserPartial.avatar_url
- Use flake ^6.0.0 for testing
- Remove implicit
Optional
types to comply with PEP 484
- Fix selectors for date tags in journals and submissions which sometimes caused the incorrect date to be selected
- Use mypy ^0.991
- Complies with PEP 484
- Improve parsing of usernames and statuses
- Thanks to PR #7 by @Xraydylan
- Fix parsing of user tags for folders when the user had no title set, or used bars (
|
) in their title
- Fix admins' username and status not being parsed correctly in watchlists and users tags
- Fix issue #6
- Users with non-alphanumeric characters in their name are now escaped in URLs
- From suggestion in issue #5
- Fix admins' username and status not being parsed correctly
- Fix issue #6
- Fix ` being removed from usernames
- Fix incorrect user icon URLs when converting BBCode to HTML
- Use pytest ^7.2.0
- Fix CVE-2022-42969 issue
- Submission footers
- Submission footers are now separated from the submission description and stored in the
Submission.footer
field - The BBCode of the footer can be accessed with the
Submission.footer_bbcode
property
- Submission footers are now separated from the submission description and stored in the
- Generate user icon URLs
- New
generate_user_icon_url()
method added toUserPartial
andUser
to create the URL for the current user icon
- New
- BBCode to HTML conversion
- Work-in-progress version of a BBCode converter based on the bbcode library
- Converter function is located in the
parse
submodule:faapi.parse.bbcode_to_html()
- The majority of HTML fields (submission descriptions, journal contents, comments, etc.) can be converted back and forth between HTML and BBCode without loosing information
- If a submission contains incorrect or very unusual BBCode tags or text, the BBCode to HTML conversion may create artifacts and tags that did not exist in the original content
- Added
Journal.header_bbcode
andJournal.footer_bbcode
properties to convertJournal.header
andJournal.footer
to BBCode - Return
None
instead of 0 (or""
for favorites) when reaching the last page withFAAPI.gallery()
,FAAPI.scraps()
,FAAPI.journals()
,FAAPI.favorites()
,FAAPI.watchlist_by()
, andFAAPI.watchlist_to()
- Added
__hash__
method toUser
,UserPartial
,Submission
,SubmissionPartial
,Journal
,JournalPartial
, andComment
; the hash value is calculated using the same fields used for equality comparisons - Improved cleanup of HTML fields by using htmlmin
- Fur Affinity URLs are now properly converted to relative
[url=<path>]
tags in BBCode - Unknown tags are converted to
[tag=<name>.<classes>]
in BBCode - Added
CookieDict(TypedDict)
notation for cookies dictionary (alternative toCookieJar
) to provide intellisense and type checking information
- Fix comments being considered equal even if they had different parents but the same ID
- Fix break lines tags (
<br/>
) not always being converted to newlines when converting to BBCode - Fix errors when converting nav links (e.g.
[2,1,3]
) to BBCode - Fix incorrect detection of last page in
FAAPI.watchlist_by()
andFAAPI.watchlist_by()
- Fix errors when converting special characters (e.g.
&
) - Fix trailing spaces around newlines remaining after converting to BBCode
- Fix horizontal lines not being correctly converted from BBCode if the dashes (
-----
or longer) were not surrounded by newlines
- Added htmlmin ^0.1.12
- Added bbcode ^1.1.0
- Improved HTML extraction for specific tags to avoid encoding issues
- HTML fields are cleaned up (i.e., removed newlines, carriage returns, and extra spaces)
- None of the parsed pages use tags with pre white space rendering, so no information is lost
- Improvements to BBCode conversion
- Do not quote URLs when converting to BBCode
- Support nested quote blocks
- Support non-specific tags (e.g.
div.submission-footer
) and convert them to[tag.<tag name>.<tag class>][/tag.<tag.name>]
- Fix incorrect encoding of special characters (
<
,>
, etc.) in HTML fields- Was caused by the previous method of extracting the inner HTML of a tag
- Fix URLs automatically shortened by Fur Affinity being converted to BBCode with the wrong text content
- Fix HTML paragraph tags (
<p>
) sometimes appearing in BBCode-converted content - Fix BBCode conversion of
:usernameicon:
links (i.e., user icon links without the username)
- Submission user folders
- Submission folders are now parsed and stored in a dedicated
user_folders
field in theSubmission
object - Each folder is stored in a
namedtuple
with fields forname
,url
, andgroup
(if any)
- Submission folders are now parsed and stored in a dedicated
- BBCode conversion
- New properties have been added to the
User
,Submission
,Journal
,JournalPartial
, andComment
objects to provide BBCode versions of HTML fields - The generated BBCode tags follow the Fur Affinity standard found on their support page
- New properties have been added to the
- Use lxml ^4.9.1
- Fix CVE-2022-2309 issue
- Fix error when parsing journals folders and journal pages caused by date format set to full on Fur Affinity's site settings
- Requests timeout
- New
FAAPI.timeout: int | None
variable to set request timeout in seconds - Timeout is used for both page requests (e.g. submissions) and file requests
- New
- Fix possible parsing error arising from multiple attributes in one tag
- Frontpage
- New
FAAPI.frontpage()
method to get submissions from Fur Affinity's front page
- New
- Sorting of
Journal
,Submission
, andUser
objects- All data objects now support greater than, greater or equal, lower than, and lower or equal operations for easy sorting
- Fix equality comparison between
Journal
andJournalPartial
- Fix parsing of usernames from user pages returning the title instead
- Caused by a change in Fur Affinity's DOM
- Journal headers and footers
- The
Journal
class now contains header and footer fields which are parsed from journal pages (FAAPI.journal
)
- The
- Submission favorite status and link
- The
Submission
class now contains a booleanfavorite
field that is set toTrue
if the submission is a favorite, and afavorite_toggle_link
containing the link to toggle the favorite status (/fav/
or/unfav/
)
- The
- User watch and block statuses and links
- The
User
class now contains booleanwatched
andblocked
fields that are set toTrue
if the user is watched or blocked respectively, andwatched_toggle_link
andblocked_toggle_link
fields containing the links to toggle the watched (/watch/
or/unwatch/
) and blocked (/block/
or/unblock/
) statuses respectively.
- The
- Remove
parse.check_page
function which had no usage in the library anymore - Remove
parse.parse_search_submissions
function andFAAPI.search
method- They will be reintroduced once Fur Affinity allows scraping search pages again
- Fix an incorrect regular expression that parsed mentions in journals, submissions, and profiles which could cause
non-Fur Affinity links to be matched as valid
- Security issue #3
- Fix
FAAPI.journals
not detecting the next page correctly- Caused by a change in Fur Affinity's journals page
-
Comments! 💬
- A new
Comment
object is now used to store comments for submissions and journals - The comments are organised in a tree structure, and each one contains references to both its parent
object (
Submission
orJournal
) and, if the comment is a reply, to its parent comment too - The auxiliary functions
faapi.comment.flatten_comments
andfaapi.comment.sort_comments
allow to flatten the comment tree or reorganise it
- A new
-
Separate
JournalPartial
andJournal
objects- The new
JournalPartial
class takes the place of the previousJournal
class, and it is now used only to parse journal from a user's journals folder - The new
Journal
class contains the same fields asJournalPartial
with the addition of comments, and it is only used to parse journal pages
- The new
-
Comparisons
- All objects can now be used with the comparison (==) operator with other objects of the same type or the type of
their key property (
id: int
for submissions and journals, andname_url: str
for users)
- All objects can now be used with the comparison (==) operator with other objects of the same type or the type of
their key property (
- The
cookies
argument ofFAAPI
is now mandatory, and anUnauthorized
exception is raised ifFAAPI
is initialised with an empty cookies list - The list of
Submission
/Journal
objects returned byFAAPI.gallery
,FAAPI.scraps
, andFAAPI.journals
now uses a sharedUserPartial
object in theauthor
variable (i.e. changing a property of the author in one object of the list will change it for the others as well)
- Fix path checking against robots.txt not working correctly with paths missing a leading forward slash
- New
Submission.stats
field for submission statistics stored in a named tuple (views
,comments
(count) ,favorites
)- Pull request #2, thanks to @warpKaiba!
- New
Journal.stats
field for journal statistics stored in a named tuple (comments
(count))
- Rename
UserStats.favs
toUserStats.favorites
- Fix links in PyPi metadata pointing to previous hosting at GitLab
- Better and more resilient robots.txt parsing
- Fix spaces around slash (/) not being preserved for submission categories
- Raise
DisabledAccount
for users pending deletion - Error messages from server are not lowercase
- Fix rare occurrence of error message not being parsed if inside a
section.notice-message
- New
NotFound
exception inheriting fromParsingError
- Removed
FAAPI.submission_exists
,FAAPI.journal_exists
, andFAAPI.user_exists
methods - Improved reliability of error pages' parser
- Custom exceptions inherit from
Exception
instead ofBaseException
- No changes to code; migrated repository to GitHub and updated README and PyPi metadata
- Allow empty info/contacts when parsing user profiles
- Fix last page check when parsing galleries
- Use BaseException as base class of custom exceptions
- Use requests ^2.27.1
- Allow submission thumbnail tag to be null
- Use
UserStats
class to hold user statistics instead of namedtuple - Add watched by and watching stats to
UserStats
- Safer parsing
- Add docstrings
- Handle robots.txt parsing with
urllib.RobotFileParser
User-Agent
header is exposed asFAAPI.user_agent
property
FAAPI.last_get
uses UNIX timeFAAPI.check_path
doesn't raise an exception by defaultFAAPI.login_status
does not raise an exception on unauthorized- Remove crawl delay error
- Improve download of files
FAAPI.get_parsed
checks login status and checks the page for errors directly (both can be manually skipped)- Add
Unauthorized
exception
FAAPI.submission
andFAAPI.submission_file
support setting the chunk size for the binary file download
- The file downloader uses chunk size instead of speed
- When raising
ServerError
andNoticeMessage
, the actual messages appearing on the page are use as exception arguments
- Add support for
http.cookiejar.CookieJar
(and inheriting classes, likerequests.cookies.RequestsCookieJar
) for cookies. - Add
FAAPI.me()
method to get the logged-in user - Add
FAAPI.login_status
property to get the current login status
- Use lxml ^4.7.1
- Fix CVE-2021-43818 issue
- Fix rare error when parsing the info section of a userpage
- Fix a key error in
Submission
when assigning the parsed results
- Upgrade to Python 3.9+
- Update type annotations
Submission
parses next and previous submission IDsFAAPI.watchlist_by()
andFAAPI.watchlist_to()
methods support multiple watchlist pages
- Renamed
FAAPI.get_parse
toget_parsed
- Removed get prefix from
FAAPI
methods (e.g.get_submission
tosubmission
) and return a list ofUserPartials
objects instead ofUsers
- Added
__all__
declarations to allow importing exceptions and secondary functions fromconnection
andparse
datetime
fields are not serialised on__iter__
(e.g. when casting aSubmission
object todict
)