RenaiApp/workspace/ideas/site-crawling.md

1.7 KiB

Website Crawling

The application needs to be able to read data from established hentai sites. This includes the manga themselves, their metadata (tags, author, etc.), and user data (lists, rating, etc.).

This is derived from user stories #4, #5, #6.

None of these sites have an official API. Which means data needs to be read from html.

Depending on whether the sites use captcha or not, authentication could also be difficult. It might be easier to leverage the chromium in electron to load the sites themselves and read their data from there. On the other hand, this might be vulnerable to dubious redirects to ad sites. Another idea is to use <iframe>.

website provides api user data metadata torrent download
nhentai -
Tsumino - -
E-Hentai -
Hentai Cafe - - -

nhentai

  • probably most popular (either this or e-hentai)

Tsumino

  • provides direct zip download, but locked behind Google's reCaptcha
  • the normal image view seems to has some kind of authentication key shenanigans as well

E-Hentai

Hentai Cafe

  • the most bare functionality, probably easiest to crawl