banner
二階堂春希

春希のブログ

山雨欲来风满楼,故攻八面以铸无双。 孤战非所望,俗安不可期。
tg_channel
telegram
twitter
github

Some thoughts on operating a large Galgame resource site.

The endpoint is dead, and most galgame resource sites have shut down to avoid trouble. We need a powerful resource site. Here are some ideas, hoping to help those who have ideas for operating resource sites.

No Illusions about Politics#

There is a saying that if you don't seek politics, it will seek you.

While mourning the arrest of the endpoint forum owner, have you ever thought further? This is the so-called foolish karma. Since you chose to be the meat on someone else's chopping board, don't complain about the unfairness of fate when you are chopped up and eaten. Have you ever thought that the tears you shed are the water that once entered your brain?

The endpoint forum owner lacks a clear understanding of politics and blindly believes that China will ignore his behavior of opening a forum in the country, and he has a sense of luck. Even worse, they assisted in tyranny by implementing speech censorship and even supporting real-name registration on the forum. The consequence is that the real-name registration they supported and the domestic brands they supported sent them to prison.

A person's public actions should be consistent with their political stance, otherwise they will be vulnerable. If you want to run a galgame resource site, you should clearly understand that this behavior is illegal in China and can be sanctioned at any time. Therefore, it is inevitable to resist Chinese censorship and regulation.

China is striving to catch up with the far ahead North Korea, which is an obvious fact. We cannot even be sure if one day we will no longer be able to play galgames in China due to censorship and authoritarianism.

Front-end and Back-end Choices#

SSR architecture is actually a good choice. After using Cloudflare, static pages can be effectively cached, so it can achieve similar concurrency tolerance as CS architecture. However, the main advantage of SSR is to increase the difficulty of web scraping.

For backend storage, Onedrive is an obvious choice as it is convenient and free. Otherwise, with 1TB of traffic per day, VPS/VDS will not have enough bandwidth and the cost of object storage is astonishing. Capacity is not a big issue. However, even with Onedrive, multiple accounts are needed for load balancing to avoid exceeding API call limits. One advantage of Onedrive is that it can be accessed in China, although the speed may not be high.

The token of Onedrive should be cached, otherwise, when there is high concurrency, there may be memory overflow due to too many asynchronous calls to the Microsoft API. The caching strategy should ensure that users can complete the download before a token expires.

Caching can be implemented using databases like Redis, and scheduled tasks are relatively easy.

Organizing a Large Amount of Resources#

Organizing a large amount of resources is laborious. Referring to BT sites and E-hentai, using standardized resource naming can effectively organize resources and improve retrieval capabilities.

A suggested resource naming format is:

(Series Name)[Company Name 1][Company Name 2][Company Name n] Original Japanese Name (Chinese Name 1)(Chinese Name 2)(Chinese Name 4)[Platform]{Chinese Translation Group, etc.}

The series name, Chinese name, and Chinese translation group information can be omitted.

This naming format can be recognized using regular expressions, so if necessary, it can be directly analyzed based on the name without using a database. The reference code is as follows:

import re

# Example string
example = "(Series Name)[Company Name 1][Company Name 2][Company Name n] Original Japanese Name (Chinese Name 1)(Chinese Name 2)(Chinese Name 4)[Platform]{Chinese Translation Group, etc.}"

# Regular expression
pattern = r'\((.*?)\)?(\[(.*?)\])+(.*?)\((.*?)\)(\[(.*?)\])?\{(.*?)\}?'

# Parse the string
match = re.match(pattern, example)

if match:
    # Extract data
    series = match.group(1)
    publishers = match.group(3).split('][')
    jp_name = match.group(4).strip()
    cn_names = match.group(5).split(')(')
    platform = match.group(7)
    comment = match.group(8)

    # Create object
    result = {
        "series": series if series else None,
        "publisher": publishers,
        "jpName": jp_name,
        "cnName": cn_names,
        "platform": platform if platform else None,
        "comment": comment if comment else None
    }
else:
    result = None

print(result)

When the resources are large enough, in order to facilitate retrieval and run various algorithms, you can crawl vndb and store galgame information locally. This approach can achieve advanced tag retrieval capabilities similar to E-hentai, as well as recommendation algorithms and popularity algorithms based on graph structures.

Resource Storage#

Obviously, it is necessary to store the same content in multiple Onedrive accounts. In addition to that, it is also necessary to store it in a reliable cloud storage (such as MEGA) or locally in a different location.

Storing data locally is not safe because various accidents may occur and result in the loss of all data. Therefore, it is necessary to have multiple people in different locations storing the same data to achieve remote disaster recovery.

The choice of local storage media is a problem. Hard disk prices are relatively expensive, and the hard disk capacity required for storing all galgames is about 12TB, which is quite expensive and costs over 1,000 RMB. If conditions permit, inexpensive tapes can also be used as storage media. However, second-hand hard disks should not be used, even though they are cheap, as there is no guarantee when the data will be lost.

The cost of storing in cloud storage is not small. If only stored in Onedrive, there is a possibility of losing all resources due to subscription expiration and account suspension.

Costs#

The Onedrive solution should not have much cost in theory, but as more people use it, there will inevitably be various maintenance costs and server costs. However, as more people use it, it means there are more opportunities to recover costs.

A qualified galgame resource site should not set download barriers, let alone charge for downloads. The reasonable way to offset costs is to accept donations and advertisements.

Accepting donations carries some risks because it may require the use of domestic payment methods. As for advertisements, as long as more people use it, there will be advertisers seeking to advertise.

However, when distributing advertisements, attention should be paid to not affecting the user experience.

This is the end for now, more may be added later.

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.