1.7 KiB

Raw Blame History

Website Crawling

The application needs to be able to read data from established hentai sites. This includes the manga themselves, their metadata (tags, author, etc.), and user data (lists, rating, etc.).

This is derived from user stories #4, #5, #6.

None of these sites have an official API. Which means data needs to be read from html.

Depending on whether the sites use captcha or not, authentication could also be difficult. It might be easier to leverage the chromium in electron to load the sites themselves and read their data from there. On the other hand, this might be vulnerable to dubious redirects to ad sites. Another idea is to use <iframe>.

website	provides api	user data	metadata	torrent download
nhentai	-	✓	✓	✓
Tsumino	-	✓	✓	-
E-Hentai	-	✓	✓	✓
Hentai Cafe	-	-	✓	-

nhentai

probably most popular (either this or e-hentai)

Tsumino

provides direct zip download, but locked behind Google's reCaptcha
the normal image view seems to has some kind of authentication key shenanigans as well

E-Hentai

https://exhentai.org/
will probably be archived in the near future

Hentai Cafe

the most bare functionality, probably easiest to crawl

1.7 KiB Raw Blame History

Website Crawling

1.7 KiB

Raw Blame History