mirror of
https://github.com/Radiquum/furaffinity-dl.git
synced 2025-04-05 15:54:38 +00:00
"rewrited" some part of the code
Create directory tree based on author name Filter posts like "YCH OPEN, REMINDER" argument parser changes
This commit is contained in:
parent
77760e7135
commit
6e51fffed7
5 changed files with 376 additions and 211 deletions
4
.gitignore
vendored
4
.gitignore
vendored
|
@ -7,3 +7,7 @@ cookies.txt
|
||||||
*.json
|
*.json
|
||||||
*.gif
|
*.gif
|
||||||
*.swf
|
*.swf
|
||||||
|
|
||||||
|
# code monitoring stuff
|
||||||
|
.vscode
|
||||||
|
ignore
|
43
.pre-commit-config.yaml
Normal file
43
.pre-commit-config.yaml
Normal file
|
@ -0,0 +1,43 @@
|
||||||
|
# See https://pre-commit.com for more information
|
||||||
|
# See https://pre-commit.com/hooks.html for more hooks
|
||||||
|
repos:
|
||||||
|
- repo: https://github.com/psf/black
|
||||||
|
rev: 22.3.0
|
||||||
|
hooks:
|
||||||
|
- id: black
|
||||||
|
args: [--safe]
|
||||||
|
|
||||||
|
- repo: https://github.com/pre-commit/pre-commit-hooks
|
||||||
|
rev: v4.3.0
|
||||||
|
hooks:
|
||||||
|
- id: trailing-whitespace
|
||||||
|
- id: end-of-file-fixer
|
||||||
|
- id: check-yaml
|
||||||
|
- id: check-added-large-files
|
||||||
|
- id: debug-statements
|
||||||
|
language_version: python3
|
||||||
|
|
||||||
|
- repo: https://github.com/PyCQA/flake8
|
||||||
|
rev: 4.0.1
|
||||||
|
hooks:
|
||||||
|
- id: flake8
|
||||||
|
language_version: python3
|
||||||
|
|
||||||
|
- repo: https://github.com/asottile/reorder_python_imports
|
||||||
|
rev: v3.1.0
|
||||||
|
hooks:
|
||||||
|
- id: reorder-python-imports
|
||||||
|
args: [--application-directories=.:src, --py39-plus]
|
||||||
|
|
||||||
|
- repo: https://github.com/asottile/pyupgrade
|
||||||
|
rev: v2.34.0
|
||||||
|
hooks:
|
||||||
|
- id: pyupgrade
|
||||||
|
args: [--py39-plus]
|
||||||
|
|
||||||
|
- repo: https://github.com/pre-commit/mirrors-mypy
|
||||||
|
rev: v0.961
|
||||||
|
hooks:
|
||||||
|
- id: mypy
|
||||||
|
files: ^src/
|
||||||
|
args: []
|
45
README.md
45
README.md
|
@ -4,12 +4,15 @@ This branch is the development version of furaffinity-dl rewritten in python.
|
||||||
|
|
||||||
**furaffinity-dl** is a python script for batch downloading of galleries (and scraps/favourites) from furaffinity users users or your submissons!
|
**furaffinity-dl** is a python script for batch downloading of galleries (and scraps/favourites) from furaffinity users users or your submissons!
|
||||||
It was written for preservation of culture, to counter the people nuking their galleries every once a while.
|
It was written for preservation of culture, to counter the people nuking their galleries every once a while.
|
||||||
|
and then modified for confinience.
|
||||||
|
|
||||||
Supports all known submission types: images, text, flash and audio.
|
Supports all known submission types: images, text, flash and audio.
|
||||||
|
|
||||||
## Requirements
|
## Requirements
|
||||||
|
|
||||||
`pip3 install -r requirements.txt`
|
`python 3`
|
||||||
|
|
||||||
|
`pip install -r requirements.txt`
|
||||||
|
|
||||||
**The script currently only works with the "Modern" theme**
|
**The script currently only works with the "Modern" theme**
|
||||||
|
|
||||||
|
@ -21,48 +24,50 @@ When downloading a folder make sure to put everything after **/folder/**, for ex
|
||||||
|
|
||||||
```help
|
```help
|
||||||
|
|
||||||
usage: furaffinity-dl.py [-h] [--output OUTPUT] [--cookies COOKIES] [--ua UA] [--start START] [--stop STOP] [--dont-redownload]
|
usage: furaffinity-dl.py [-h] [--category [CATEGORY]] [--submissions] [--folder FOLDER] [--output OUTPUT] [--cookies COOKIES]
|
||||||
[--interval INTERVAL] [--metadir METADIR]
|
[--user-agent [UA]] [--start [START]] [--stop STOP] [--dont-redownload] [--interval INTERVAL] [--rating]
|
||||||
[category] [username] [folder]
|
[--metadata]
|
||||||
|
username
|
||||||
|
|
||||||
Downloads the entire gallery/scraps/favorites of a furaffinity user, or your submissions
|
Downloads the entire gallery/scraps/favorites of a furaffinity user, or your submissions
|
||||||
|
|
||||||
positional arguments:
|
positional arguments:
|
||||||
category the category to download, gallery/scraps/favorites
|
username username of the furaffinity user [required]
|
||||||
username username of the furaffinity user
|
|
||||||
folder name of the folder (full path, for instance 123456/Folder-Name-Here)
|
|
||||||
|
|
||||||
options:
|
options:
|
||||||
-h, --help show this help message and exit
|
-h, --help show this help message and exit
|
||||||
--output OUTPUT, -o OUTPUT
|
--category [CATEGORY], -ca [CATEGORY]
|
||||||
output directory
|
the category to download, gallery/scraps/favorites [default: gallery]
|
||||||
|
--submissions, -su download your submissions
|
||||||
|
--folder FOLDER full path of the furaffinity folder. for instance 123456/Folder-Name-Here
|
||||||
|
--output OUTPUT output directory [default: furaffinity-dl]
|
||||||
--cookies COOKIES, -c COOKIES
|
--cookies COOKIES, -c COOKIES
|
||||||
path to a NetScape cookies file
|
path to a NetScape cookies file
|
||||||
--ua UA, -u UA Your browser's useragent, may be required, depending on your luck
|
--user-agent [UA], -u [UA]
|
||||||
--start START, -s START
|
Your browser's useragent, may be required, depending on your luck
|
||||||
|
--start [START], -s [START]
|
||||||
page number to start from
|
page number to start from
|
||||||
--stop STOP, -S STOP Page number to stop on. For favorites pages, specify the full URL after the username (1234567890/next).
|
--stop STOP, -S STOP Page number to stop on. For favorites pages, specify the full URL after the username (1234567890/next).
|
||||||
--dont-redownload, -d
|
--dont-redownload, -d
|
||||||
Don't redownload files that have already been downloaded
|
Don't redownload files that have already been downloaded [default: true]
|
||||||
--interval INTERVAL, -i INTERVAL
|
--interval INTERVAL, -i INTERVAL
|
||||||
delay between downloading pages
|
delay between downloading pages in seconds [default: 0]
|
||||||
--metadir METADIR, -m METADIR
|
--rating, -r enable rating separation [default: true]
|
||||||
directory to put meta files in
|
--metadata, -m enable downloading of metadata [default: false]
|
||||||
|
|
||||||
Examples:
|
Examples:
|
||||||
python3 furaffinity-dl.py gallery koul
|
python3 furaffinity-dl.py koul koul_gallery
|
||||||
python3 furaffinity-dl.py -o koulsArt gallery koul
|
python3 furaffinity-dl.py -o koulsArt gallery koul
|
||||||
python3 furaffinity-dl.py -o mylasFavs favorites mylafox
|
python3 furaffinity-dl.py -o mylasFavs --category favorites mylafox
|
||||||
|
|
||||||
You can also log in to FurAffinity in a web browser and load cookies to download Age restricted content or Submissions:
|
You can also log in to FurAffinity in a web browser and load cookies to download Age restricted content or Submissions:
|
||||||
python3 furaffinity-dl.py -c cookies.txt gallery letodoesart
|
python3 furaffinity-dl.py -c cookies.txt gallery letodoesart
|
||||||
python3 furaffinity-dl.py -c cookies.txt msg submissions
|
python3 furaffinity-dl.py -c cookies.txt --submissions
|
||||||
|
|
||||||
DISCLAIMER: It is your own responsibility to check whether batch downloading is allowed by FurAffinity terms of service and to abide by them.
|
DISCLAIMER: It is your own responsibility to check whether batch downloading is allowed by FurAffinity terms of service and to abide by them.
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
You can also log in to download restricted content. To do that, log in to FurAffinity in your web browser, export cookies to a file from your web browser in Netscape format (there are extensions to do that [for Firefox](https://addons.mozilla.org/en-US/firefox/addon/ganbo/) and [for Chrome based browsers](https://chrome.google.com/webstore/detail/cookiestxt/njabckikapfpffapmjgojcnbfjonfjfg)), you can then pass them to the script with the `-c` flag, like this (you may also have to provide your user agent):
|
You can also log in to download restricted content. To do that, log in to FurAffinity in your web browser, export cookies to a file from your web browser in Netscape format (there are extensions to do that [for Firefox](https://addons.mozilla.org/en-US/firefox/addon/ganbo/) and [for Chrome based browsers](https://chrome.google.com/webstore/detail/get-cookiestxt/bgaddhkoddajcdgocldbbfleckgcbcid?hl=en)), you can then pass them to the script with the `-c` flag, like this (you may also have to provide your user agent):
|
||||||
|
|
||||||
`python3 furaffinity-dl.py -c cookies.txt -u 'Mozilla/5.0 ....' gallery letodoesart`
|
`python3 furaffinity-dl.py -c cookies.txt -u 'Mozilla/5.0 ....' gallery letodoesart`
|
||||||
|
|
||||||
|
|
425
furaffinity-dl.py
Executable file → Normal file
425
furaffinity-dl.py
Executable file → Normal file
|
@ -1,99 +1,161 @@
|
||||||
#!/usr/bin/python3
|
#!/usr/bin/python3
|
||||||
import argparse
|
import argparse
|
||||||
from types import NoneType
|
|
||||||
from tqdm import tqdm
|
|
||||||
from argparse import RawTextHelpFormatter
|
|
||||||
import json
|
|
||||||
from bs4 import BeautifulSoup
|
|
||||||
import requests
|
|
||||||
import http.cookiejar as cookielib
|
import http.cookiejar as cookielib
|
||||||
import re
|
import json
|
||||||
import os
|
import os
|
||||||
|
import re
|
||||||
from time import sleep
|
from time import sleep
|
||||||
|
|
||||||
'''
|
import requests
|
||||||
Please refer to LICENSE for licensing conditions.
|
from bs4 import BeautifulSoup
|
||||||
'''
|
from tqdm import tqdm
|
||||||
|
|
||||||
# Argument parsing
|
# Argument parsing
|
||||||
parser = argparse.ArgumentParser(formatter_class=RawTextHelpFormatter, description='Downloads the entire gallery/scraps/favorites of a furaffinity user, or your submissions', epilog='''
|
parser = argparse.ArgumentParser(
|
||||||
|
formatter_class=argparse.RawTextHelpFormatter,
|
||||||
|
description="Downloads the entire gallery/scraps/favorites of a furaffinity user, or your submissions",
|
||||||
|
epilog="""
|
||||||
Examples:
|
Examples:
|
||||||
python3 furaffinity-dl.py gallery koul
|
python3 furaffinity-dl.py koul koul_gallery
|
||||||
python3 furaffinity-dl.py -o koulsArt gallery koul
|
python3 furaffinity-dl.py -o koulsArt gallery koul
|
||||||
python3 furaffinity-dl.py -o mylasFavs favorites mylafox\n
|
python3 furaffinity-dl.py -o mylasFavs --category favorites mylafox\n
|
||||||
You can also log in to FurAffinity in a web browser and load cookies to download Age restricted content or Submissions:
|
You can also log in to FurAffinity in a web browser and load cookies to download Age restricted content or Submissions:
|
||||||
python3 furaffinity-dl.py -c cookies.txt gallery letodoesart
|
python3 furaffinity-dl.py -c cookies.txt gallery letodoesart
|
||||||
python3 furaffinity-dl.py -c cookies.txt msg submissions\n
|
python3 furaffinity-dl.py -c cookies.txt --submissions\n
|
||||||
DISCLAIMER: It is your own responsibility to check whether batch downloading is allowed by FurAffinity terms of service and to abide by them.
|
DISCLAIMER: It is your own responsibility to check whether batch downloading is allowed by FurAffinity terms of service and to abide by them.
|
||||||
''')
|
""",
|
||||||
parser.add_argument('category', metavar='category', type=str, nargs='?', default='gallery', help='the category to download, gallery/scraps/favorites')
|
)
|
||||||
parser.add_argument('username', metavar='username', type=str, nargs='?', help='username of the furaffinity user')
|
|
||||||
parser.add_argument('folder', metavar='folder', type=str, nargs='?', help='name of the folder (full path, for instance 123456/Folder-Name-Here)')
|
# General stuff
|
||||||
parser.add_argument('--output', '-o', dest='output', type=str, default='.', help="output directory")
|
parser.add_argument(
|
||||||
parser.add_argument('--cookies', '-c', dest='cookies', type=str, default='', help="path to a NetScape cookies file")
|
"--category",
|
||||||
parser.add_argument('--ua', '-u', dest='ua', type=str, default='Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.7) Gecko/20100101 Firefox/68.7', help="Your browser's useragent, may be required, depending on your luck")
|
"-ca",
|
||||||
parser.add_argument('--start', '-s', dest='start', type=str, default=1, help="page number to start from")
|
type=str,
|
||||||
parser.add_argument('--stop', '-S', dest='stop', type=str, default='', help="Page number to stop on. For favorites pages, specify the full URL after the username (1234567890/next).")
|
nargs="?",
|
||||||
parser.add_argument('--dont-redownload', '-d', const='dont_redownload', action='store_const', help="Don't redownload files that have already been downloaded")
|
help="the category to download, gallery/scraps/favorites [default: gallery]",
|
||||||
parser.add_argument('--interval', '-i', dest='interval', type=float, default=0, help="delay between downloading pages")
|
const=1,
|
||||||
parser.add_argument('--metadir', '-m', dest='metadir', type=str, default=None, help="directory to put meta files in")
|
default="gallery",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--submissions",
|
||||||
|
"-su",
|
||||||
|
action="store_true",
|
||||||
|
help="download your submissions",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"username",
|
||||||
|
type=str,
|
||||||
|
help="username of the furaffinity user [required]",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--folder",
|
||||||
|
type=str,
|
||||||
|
help="full path of the furaffinity folder. for instance 123456/Folder-Name-Here",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--output", type=str, default="furaffinity-dl", help="output directory [default: furaffinity-dl]"
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--cookies",
|
||||||
|
"-c",
|
||||||
|
dest="cookies",
|
||||||
|
type=str,
|
||||||
|
help="path to a NetScape cookies file",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--user-agent",
|
||||||
|
"-u",
|
||||||
|
dest="ua",
|
||||||
|
type=str,
|
||||||
|
nargs="?",
|
||||||
|
default="Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.7) Gecko/20100101 Firefox/68.7",
|
||||||
|
help="Your browser's useragent, may be required, depending on your luck",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--start", "-s", type=str, default=1, help="page number to start from", nargs="?"
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--stop",
|
||||||
|
"-S",
|
||||||
|
dest="stop",
|
||||||
|
type=str,
|
||||||
|
help="Page number to stop on. For favorites pages, specify the full URL after the username (1234567890/next).",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--dont-redownload",
|
||||||
|
"-d",
|
||||||
|
action="store_false",
|
||||||
|
help="Don't redownload files that have already been downloaded [default: true]",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--interval",
|
||||||
|
"-i",
|
||||||
|
dest="interval",
|
||||||
|
type=float,
|
||||||
|
default=0,
|
||||||
|
help="delay between downloading pages in seconds [default: 0]",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--rating",
|
||||||
|
"-r",
|
||||||
|
action="store_false",
|
||||||
|
help="enable rating separation [default: true]",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--metadata",
|
||||||
|
"-m",
|
||||||
|
action="store_true",
|
||||||
|
help="enable downloading of metadata [default: false]",
|
||||||
|
)
|
||||||
|
|
||||||
args = parser.parse_args()
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
BASE_URL = "https://www.furaffinity.net"
|
||||||
|
categories = {
|
||||||
|
"gallery": "gallery",
|
||||||
|
"scraps": "scraps",
|
||||||
|
"favorites": "favorites",
|
||||||
|
}
|
||||||
|
category = categories.get(args.category)
|
||||||
|
if category is None:
|
||||||
|
print("please enter a valid category")
|
||||||
|
exit()
|
||||||
if args.username is None:
|
if args.username is None:
|
||||||
parser.print_help()
|
print("please enter a FA Username")
|
||||||
|
exit()
|
||||||
|
if args.output is None:
|
||||||
|
print("please enter a output folder")
|
||||||
exit()
|
exit()
|
||||||
|
|
||||||
# Create output directory if it doesn't exist
|
username = args.username
|
||||||
if args.output != '.':
|
output = f"{args.output}/{args.username}"
|
||||||
os.makedirs(args.output, exist_ok=True)
|
metadata = f"{output}/metadata"
|
||||||
|
|
||||||
if args.metadir == None:
|
filter = {"YCH Open", "Reminder", "YCH Closed", "Auction"}
|
||||||
args.metadir = args.output
|
|
||||||
else:
|
|
||||||
os.makedirs(args.metadir, exist_ok=True)
|
|
||||||
|
|
||||||
|
|
||||||
# Check validity of category
|
|
||||||
valid_categories = ['gallery', 'favorites', 'scraps', 'msg']
|
|
||||||
if args.category not in valid_categories:
|
|
||||||
raise Exception('Category is not valid', args.category)
|
|
||||||
|
|
||||||
# Check validity of username
|
|
||||||
if bool(re.compile(r'[^a-zA-Z0-9\-~._]').search(args.username)):
|
|
||||||
raise Exception('Username contains non-valid characters', args.username)
|
|
||||||
|
|
||||||
# Initialise a session
|
|
||||||
session = requests.session()
|
session = requests.session()
|
||||||
session.headers.update({'User-Agent': args.ua})
|
session.headers.update({"User-Agent": args.ua})
|
||||||
|
|
||||||
# Load cookies from a netscape cookie file (if provided)
|
if args.cookies is not None:
|
||||||
if args.cookies != '':
|
|
||||||
cookies = cookielib.MozillaCookieJar(args.cookies)
|
cookies = cookielib.MozillaCookieJar(args.cookies)
|
||||||
cookies.load()
|
cookies.load()
|
||||||
session.cookies = cookies
|
session.cookies = cookies
|
||||||
|
|
||||||
base_url = 'https://www.furaffinity.net'
|
|
||||||
gallery_url = '{}/{}/{}'.format(base_url, args.category, args.username)
|
|
||||||
if args.folder is not None:
|
|
||||||
gallery_url += "/folder/"
|
|
||||||
gallery_url += args.folder
|
|
||||||
page_num = args.start
|
|
||||||
|
|
||||||
|
|
||||||
def download_file(url, fname, desc):
|
def download_file(url, fname, desc):
|
||||||
r = session.get(url, stream=True)
|
r = session.get(url, stream=True)
|
||||||
if r.status_code != 200:
|
if r.status_code != 200:
|
||||||
print("Got a HTTP {} while downloading {}; skipping".format(r.status_code, fname))
|
print(f"Got a HTTP {r.status_code} while downloading {fname}; ...skipping")
|
||||||
return False
|
return False
|
||||||
|
|
||||||
total = int(r.headers.get('Content-Length', 0))
|
total = int(r.headers.get("Content-Length", 0))
|
||||||
with open(fname, 'wb') as file, tqdm(
|
with open(fname, "wb") as file, tqdm(
|
||||||
desc=desc.ljust(40)[:40],
|
desc=desc.ljust(40)[:40],
|
||||||
total=total,
|
total=total,
|
||||||
miniters=100,
|
miniters=100,
|
||||||
unit='b',
|
unit="b",
|
||||||
unit_scale=True,
|
unit_scale=True,
|
||||||
unit_divisor=1024
|
unit_divisor=1024,
|
||||||
) as bar:
|
) as bar:
|
||||||
for data in r.iter_content(chunk_size=1024):
|
for data in r.iter_content(chunk_size=1024):
|
||||||
size = file.write(data)
|
size = file.write(data)
|
||||||
|
@ -101,161 +163,212 @@ def download_file(url, fname, desc):
|
||||||
return True
|
return True
|
||||||
|
|
||||||
|
|
||||||
# The cursed function that handles downloading
|
download_url = f"{BASE_URL}/{category}/{username}"
|
||||||
|
if args.folder is not None:
|
||||||
|
download_url = f"{BASE_URL}/gallery/{username}/folder/{args.folder}"
|
||||||
|
if args.submissions is True:
|
||||||
|
download_url = f"{BASE_URL}/msg/submissions"
|
||||||
|
|
||||||
|
|
||||||
def download(path):
|
def download(path):
|
||||||
page_url = '{}{}'.format(base_url, path)
|
response = session.get(f"{BASE_URL}{path}")
|
||||||
response = session.get(page_url)
|
s = BeautifulSoup(response.text, "html.parser")
|
||||||
s = BeautifulSoup(response.text, 'html.parser')
|
|
||||||
|
|
||||||
# System messages
|
# System messages
|
||||||
if s.find(class_='notice-message') is not None:
|
if s.find(class_="notice-message") is not None:
|
||||||
message = s.find(class_='notice-message').find('div').find(class_="link-override").text.strip()
|
message = (
|
||||||
raise Exception('System Message', message)
|
s.find(class_="notice-message")
|
||||||
|
.find("div")
|
||||||
|
.find(class_="link-override")
|
||||||
|
.text.strip()
|
||||||
|
)
|
||||||
|
raise Exception("System Message", message)
|
||||||
|
|
||||||
image = s.find(class_='download').find('a').attrs.get('href')
|
image = s.find(class_="download").find("a").attrs.get("href")
|
||||||
title = s.find(class_='submission-title').find('p').contents[0]
|
title = s.find(class_="submission-title").find("p").contents[0]
|
||||||
filename = image.split("/")[-1:][0]
|
filename = image.split("/")[-1:][0]
|
||||||
data = {
|
data = {
|
||||||
'id': int(path.split('/')[-2:-1][0]),
|
"id": int(path.split("/")[-2:-1][0]),
|
||||||
'filename': filename,
|
"filename": filename,
|
||||||
'author': s.find(class_='submission-id-sub-container').find('a').find('strong').text,
|
"author": s.find(class_="submission-id-sub-container")
|
||||||
'date': s.find(class_='popup_date').attrs.get('title'),
|
.find("a")
|
||||||
'title': title,
|
.find("strong")
|
||||||
'description': s.find(class_='submission-description').text.strip().replace('\r\n', '\n'),
|
.text,
|
||||||
|
"date": s.find(class_="popup_date").attrs.get("title"),
|
||||||
|
"title": title,
|
||||||
|
"description": s.find(class_="submission-description")
|
||||||
|
.text.strip()
|
||||||
|
.replace("\r\n", "\n"),
|
||||||
"tags": [],
|
"tags": [],
|
||||||
'category': s.find(class_='info').find(class_='category-name').text,
|
"category": s.find(class_="info").find(class_="category-name").text,
|
||||||
'type': s.find(class_='info').find(class_='type-name').text,
|
"type": s.find(class_="info").find(class_="type-name").text,
|
||||||
'species': s.find(class_='info').findAll('div')[2].find('span').text,
|
"species": s.find(class_="info").findAll("div")[2].find("span").text,
|
||||||
'gender': s.find(class_='info').findAll('div')[3].find('span').text,
|
"gender": s.find(class_="info").findAll("div")[3].find("span").text,
|
||||||
'views': int(s.find(class_='views').find(class_='font-large').text),
|
"views": int(s.find(class_="views").find(class_="font-large").text),
|
||||||
'favorites': int(s.find(class_='favorites').find(class_='font-large').text),
|
"favorites": int(s.find(class_="favorites").find(class_="font-large").text),
|
||||||
'rating': s.find(class_='rating-box').text.strip(),
|
"rating": s.find(class_="rating-box").text.strip(),
|
||||||
'comments': []
|
"comments": [],
|
||||||
}
|
}
|
||||||
|
|
||||||
|
match = re.search(
|
||||||
|
"reminder|auction|YCH Open|YCH Closed|Adopt|pre-order", title, re.IGNORECASE
|
||||||
|
)
|
||||||
|
if match is not None and title == match.string:
|
||||||
|
print(f"post {title} was filtered and will not be downloaded")
|
||||||
|
return True
|
||||||
|
|
||||||
# Extact tags
|
# Extact tags
|
||||||
try:
|
try:
|
||||||
for tag in s.find(class_='tags-row').findAll(class_='tags'):
|
for tag in s.find(class_="tags-row").findAll(class_="tags"):
|
||||||
data['tags'].append(tag.find('a').text)
|
data["tags"].append(tag.find("a").text)
|
||||||
except:
|
except AttributeError:
|
||||||
pass
|
print(f'post: "{title}", has no tags')
|
||||||
|
|
||||||
|
image_url = f"https:{image}"
|
||||||
|
os.makedirs(output, exist_ok=True)
|
||||||
|
output_path = f"{output}/{filename}"
|
||||||
|
if args.rating is True:
|
||||||
|
os.makedirs(f'{output}/{data.get("rating")}', exist_ok=True)
|
||||||
|
output_path = f'{output}/{data.get("rating")}/{filename}'
|
||||||
|
if args.folder is not None:
|
||||||
|
os.makedirs(f"{output}/{args.folder}", exist_ok=True)
|
||||||
|
output_path = f"{output}/{args.folder}/{filename}"
|
||||||
|
if args.rating is True:
|
||||||
|
os.makedirs(f'{output}/{args.folder}/{data.get("rating")}', exist_ok=True)
|
||||||
|
output_path = f'{output}/{args.folder}/{data.get("rating")}/{filename}'
|
||||||
|
|
||||||
|
if args.dont_redownload is True and os.path.isfile(output_path):
|
||||||
|
print(f'Skipping "{title}", since it\'s already downloaded')
|
||||||
|
else:
|
||||||
|
download_file(image_url, output_path, title)
|
||||||
|
|
||||||
|
if args.metadata is True:
|
||||||
# Extract comments
|
# Extract comments
|
||||||
for comment in s.findAll(class_='comment_container'):
|
for comment in s.findAll(class_="comment_container"):
|
||||||
temp_ele = comment.find(class_='comment-parent')
|
temp_ele = comment.find(class_="comment-parent")
|
||||||
parent_cid = None if temp_ele is None else int(temp_ele.attrs.get('href')[5:])
|
parent_cid = (
|
||||||
|
None if temp_ele is None else int(temp_ele.attrs.get("href")[5:])
|
||||||
|
)
|
||||||
# Comment is deleted or hidden
|
# Comment is deleted or hidden
|
||||||
if comment.find(class_='comment-link') is None:
|
if comment.find(class_="comment-link") is None:
|
||||||
continue
|
continue
|
||||||
|
|
||||||
data['comments'].append({
|
data["comments"].append(
|
||||||
'cid': int(comment.find(class_='comment-link').attrs.get('href')[5:]),
|
{
|
||||||
'parent_cid': parent_cid,
|
"cid": int(
|
||||||
'content': comment.find(class_='comment_text').contents[0].strip(),
|
comment.find(class_="comment-link").attrs.get("href")[5:]
|
||||||
'username': comment.find(class_='comment_username').text,
|
),
|
||||||
'date': comment.find(class_='popup_date').attrs.get('title')
|
"parent_cid": parent_cid,
|
||||||
})
|
"content": comment.find(class_="comment_text").contents[0].strip(),
|
||||||
|
"username": comment.find(class_="comment_username").text,
|
||||||
url = 'https:{}'.format(image)
|
"date": comment.find(class_="popup_date").attrs.get("title"),
|
||||||
output_path = os.path.join(args.output, filename)
|
}
|
||||||
|
)
|
||||||
if not args.dont_redownload or not os.path.isfile(output_path):
|
|
||||||
if not download_file(url, output_path, data["title"]):
|
|
||||||
return False
|
|
||||||
else:
|
|
||||||
print('Skipping "{}", since it\'s already downloaded'.format(data["title"]))
|
|
||||||
return True
|
|
||||||
|
|
||||||
# Write a UTF-8 encoded JSON file for metadata
|
# Write a UTF-8 encoded JSON file for metadata
|
||||||
with open(os.path.join(args.metadir, '{}.json'.format(filename)), 'w', encoding='utf-8') as f:
|
os.makedirs(metadata, exist_ok=True)
|
||||||
|
with open(
|
||||||
|
os.path.join(metadata, f"{filename}.json"), "w", encoding="utf-8"
|
||||||
|
) as f:
|
||||||
json.dump(data, f, ensure_ascii=False, indent=4)
|
json.dump(data, f, ensure_ascii=False, indent=4)
|
||||||
|
|
||||||
return True
|
return True
|
||||||
|
|
||||||
global i
|
|
||||||
i = 1
|
|
||||||
# Main downloading loop
|
|
||||||
while True:
|
|
||||||
|
|
||||||
if args.stop and args.stop == page_num:
|
# Main downloading loop
|
||||||
print(f"Reached page {args.stop}, stopping.")
|
def main():
|
||||||
|
page_end = args.stop
|
||||||
|
page_num = args.start
|
||||||
|
page_url = f"{download_url}/{page_num}"
|
||||||
|
response = session.get(page_url)
|
||||||
|
s = BeautifulSoup(response.text, "html.parser")
|
||||||
|
if s.find(class_="loggedin_user_avatar") is not None:
|
||||||
|
account_username = s.find(class_="loggedin_user_avatar").attrs.get("alt")
|
||||||
|
print("Logged in as", account_username)
|
||||||
|
else:
|
||||||
|
print("Not logged in, NSFW content is inaccessible")
|
||||||
|
|
||||||
|
while True:
|
||||||
|
if page_end == page_num:
|
||||||
|
print(f"Reached page {page_end}, stopping.")
|
||||||
break
|
break
|
||||||
|
|
||||||
page_url = '{}/{}'.format(gallery_url, page_num)
|
page_url = f"{download_url}/{page_num}"
|
||||||
response = session.get(page_url)
|
response = session.get(page_url)
|
||||||
s = BeautifulSoup(response.text, 'html.parser')
|
s = BeautifulSoup(response.text, "html.parser")
|
||||||
|
|
||||||
# Account status
|
|
||||||
if page_num == 1:
|
|
||||||
if s.find(class_='loggedin_user_avatar') is not None:
|
|
||||||
account_username = s.find(class_='loggedin_user_avatar').attrs.get('alt')
|
|
||||||
print('Logged in as', account_username)
|
|
||||||
else:
|
|
||||||
print('Not logged in, NSFW content is inaccessible')
|
|
||||||
|
|
||||||
# System messages
|
# System messages
|
||||||
if s.find(class_='notice-message') is not None:
|
if s.find(class_="notice-message") is not None:
|
||||||
message = s.find(class_='notice-message').find('div').find(class_="link-override").text.strip()
|
try:
|
||||||
raise Exception('System Message', message)
|
message = (
|
||||||
|
s.find(class_="notice-message")
|
||||||
|
.find("div")
|
||||||
|
.find(class_="link-override")
|
||||||
|
.text.strip()
|
||||||
|
)
|
||||||
|
except AttributeError:
|
||||||
|
print("You didn't provide cookies to log in")
|
||||||
|
break
|
||||||
|
raise Exception("System Message", message)
|
||||||
|
|
||||||
# End of gallery
|
# End of gallery
|
||||||
if s.find(id='no-images') is not None:
|
if s.find(id="no-images") is not None:
|
||||||
print('End of gallery')
|
print("End of gallery")
|
||||||
break
|
break
|
||||||
|
|
||||||
# Download all images on the page
|
# Download all images on the page
|
||||||
for img in s.findAll('figure'):
|
for img in s.findAll("figure"):
|
||||||
download(img.find('a').attrs.get('href'))
|
download(img.find("a").attrs.get("href"))
|
||||||
sleep(args.interval)
|
sleep(args.interval)
|
||||||
|
|
||||||
if args.category == "msg":
|
if args.submissions is True:
|
||||||
next_button = s.find('a', class_='button standard more', text="Next 48")
|
next_button = s.find("a", class_="button standard more", text="Next 48")
|
||||||
|
|
||||||
if next_button is None or next_button.parent is None:
|
if next_button is None or next_button.parent is None:
|
||||||
next_button = s.find('a', class_='button standard more-half', text="Next 48")
|
next_button = s.find(
|
||||||
|
"a", class_="button standard more-half", text="Next 48"
|
||||||
|
)
|
||||||
if next_button is None or next_button.parent is None:
|
if next_button is None or next_button.parent is None:
|
||||||
print('Unable to find next button')
|
print("Unable to find next button")
|
||||||
break
|
break
|
||||||
|
|
||||||
next_page_link = next_button.attrs['href']
|
next_page_link = next_button.attrs["href"]
|
||||||
|
page_num = next_page_link.split("/")[-2]
|
||||||
|
page_url = BASE_URL + next_page_link
|
||||||
|
|
||||||
i = i + 1
|
print("Downloading page", page_num, page_url)
|
||||||
page_num = next_page_link.split('/')[-2]
|
|
||||||
page_url = base_url + next_page_link
|
|
||||||
|
|
||||||
print('Downloading page', i, page_url)
|
|
||||||
elif args.category != "favorites":
|
elif args.category != "favorites":
|
||||||
next_button = s.find('button', class_='button standard', text="Next")
|
next_button = s.find("button", class_="button standard", text="Next")
|
||||||
if next_button is None or next_button.parent is None:
|
if next_button is None or next_button.parent is None:
|
||||||
print('Unable to find next button')
|
print("Unable to find next button")
|
||||||
break
|
break
|
||||||
|
|
||||||
page_num = next_button.parent.attrs['action'].split('/')[-2]
|
page_num = next_button.parent.attrs["action"].split("/")[-2]
|
||||||
|
|
||||||
print('Downloading page', page_num, page_url)
|
print("Downloading page", page_num, page_url)
|
||||||
else:
|
else:
|
||||||
next_button = s.find('a', class_='button mobile-button right', text="Next")
|
next_button = s.find("a", class_="button mobile-button right", text="Next")
|
||||||
if next_button is None:
|
if next_button is None:
|
||||||
print('Unable to find next button')
|
print("Unable to find next button")
|
||||||
break
|
break
|
||||||
|
|
||||||
# unlike galleries that are sequentially numbered, favorites use a different scheme.
|
# unlike galleries that are sequentially numbered, favorites use a different scheme.
|
||||||
# the "page_num" is instead: [set of numbers]/next (the trailing /next is required)
|
# the "page_num" is instead: [set of numbers]/next (the trailing /next is required)
|
||||||
|
|
||||||
next_page_link = next_button.attrs['href']
|
next_page_link = next_button.attrs["href"]
|
||||||
next_fav_num = re.search(r'\d+', next_page_link)
|
next_fav_num = re.search(r"\d+", next_page_link)
|
||||||
|
|
||||||
if next_fav_num == None:
|
if next_fav_num is None:
|
||||||
print('Failed to parse next favorite link.')
|
print("Failed to parse next favorite link.")
|
||||||
break
|
break
|
||||||
|
|
||||||
page_num = next_fav_num.group(0) + "/next"
|
page_num = next_fav_num.group(0) + "/next"
|
||||||
|
|
||||||
# parse it into numbers/next
|
# parse it into numbers/next
|
||||||
|
|
||||||
|
print("Downloading page", page_num, page_url)
|
||||||
|
|
||||||
print('Downloading page', page_num, page_url)
|
print("Finished downloading")
|
||||||
|
|
||||||
|
|
||||||
print('Finished downloading')
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
|
|
Loading…
Add table
Reference in a new issue