admin

joined 2 years ago
MODERATOR OF
 

Poor lemmy server went from constant 80-100% cpu load to gently coasting.

That thing must have been dragging things down for months, if not years.

1
submitted 4 weeks ago* (last edited 3 weeks ago) by admin@lemmit.online to c/about@lemmit.online
 

Update: (Sunday 23:00) And we're back. 2 million posts have been deleted, and their communities have been disabled in the bot. That should keep things going a little longer.

Update 2: (Monday 10:00) All the communities have been synchronized and are up to date again. That's great news! Before the purge it would take at least 24 hours to handle all communities after an outage, and even then it'd be lagging behind.


The bot is currently paused on account of a huge cleanup taking place.

The amount of post has once again outgrown the size of the server, so a lot of communities and posts will be purged, focusing first and foremost on those with the least amount of interaction. The server had been running flawlessly for the past few months, but apparently we hit a bottleneck at around 7 million posts. Which is nothing to be sneered at.

Anyway, for the upcoming week or more, a huge cleanup action will take place (deleting nearly 2 million posts), during which the bot will not run. This should give some breathing space, and also slow down the growth going forward.

So for all the Lemmit users: sorry for the delay. Also sorry if your favorite community got wiped, and no, this is not negotiable.

1
submitted 6 months ago* (last edited 6 months ago) by admin@lemmit.online to c/about@lemmit.online
 

Server space is getting crowded, so I'll be deleting a lot of communities that have tons of posts, but very little followers. And when I say tons of posts, I mean about 660.000 - a little under 10% of the total Lemmit posts. To put it into perspective - the next biggest Lemmy instance, LemmyWorld, has about 550k posts. I'll be deleting more posts than they have in total. Talk about a hobby getting out of hand...

Since the Lemmy server model was never designed for this use case (rightfully so), and I'm running this on a teensieweensie server - this is asking a lot of the database. Therefor the site will be blocked from regular traffic to lessen the load. Expect this maintenance time to last through the weekend, maybe longer.

1
submitted 1 year ago* (last edited 1 year ago) by admin@lemmit.online to c/about@lemmit.online
 

With Vultr's upcoming price hike, I'm planning to migrate this instance storage bucket to a more affordable alternative. Unfortunately, it's rather large, (over 700 GB in thumbnails), so it's going to take a while to transfer.

Long story short: the bot will be not be posting new content for a couple of days - probably about a week.


All migrations and updates have completed. The server is now all up to date again, and the bot is playing catch up.

1
Migration complete (lemmit.online)
submitted 2 years ago* (last edited 2 years ago) by admin@lemmit.online to c/about@lemmit.online
 

The good news: The migration is complete, and I've even managed to update the version to 0.18.5 (was stuck on 0.18.4-beta8 for the longest time).

~~The sad news: Cloudflare is having some issues, so nobody is able to access the new server at this time. Oh well. It'll probably be fixed Saturday morning, and I'll turn the bot back on.~~

Migration complete, and the bot has caught up on the 24 hour gap that it was offline. It only took like 12 hours this time, while in the past it was closer to taking an entire day. It probably helped that the new VM is dual core, even though the bot itself only ever makes 1 request at a time, so I didn't expect this much of an improvement.

1
Community cleanup (lemmit.online)
submitted 2 years ago* (last edited 2 years ago) by admin@lemmit.online to c/about@lemmit.online
 

Since its inception, the Lemmit instance has been controversial. That might be an understatement, but let's roll with it for now. One of the major issues people have with the bot, is the cross-posting of "interactive" Reddit posts, ie posts where the value lies in interacting with the OP, like AskMen, AskWomen, and AmITheAsshole. Personally, I fully agree with that viewpoint, but I didn't feel like interfering with supply and demand - in the sense that AmITheAsshole, for some reason, is the most subscribed community on this server.

That might change though. Earlier this week, I disabled the posibility to request new subreddits. This weekend I will follow that up by disabling the scraping of so-called interactive communities. So in order to facilitate that, I created a list of all the communities on this server (posted separately in !about@lemmit.online), and I will check each of them to see if they should be disabled. The goal is to keep a list of "content only" (or at least "content mostly") communities, where the value lies in the link that's provided or in the body of the self-post - not in the comment section. I'm sure this is going to be a disappointment to some people, but I do agree with the sentiment that this is better for Lemmy as a whole.

Edit: It is done. All 816 communities have been checked and and 110 of those have been purged from updates. I am sure some mistakes were made - that some communities have been disabled or left intact when they shouldn't have. If that's the case, reach out to me, and I'll fix it.

1
List of communities. (lemmit.online)
submitted 2 years ago* (last edited 3 weeks ago) by admin@lemmit.online to c/about@lemmit.online
 

Since it's impossible to see all the communities on here without logging in (mandatory NSFW filter), and I'm the only one with an account on here, here's a list of them. This list will be a snapshot, so the subscriber count will not be up to date, but I'm sure you'll figure it out.

Ident NSFW Status Subscribers
2meirl4meirl Enabled 30
2westerneurope4u Enabled 178
3dprinting Enabled 54
Erotica NSFW Enabled 225
IdiotsInCars Enabled 57
InternetIsBeautiful Enabled 30
MuseumOfReddit Enabled 29
ProgrammerHumor Enabled 177
Superstonk Enabled 89
abandonedporn Enabled 197
aboringdystopia Enabled 29
adviceanimals Enabled 49
altgirls NSFW Enabled 715
anal NSFW Enabled 641
animalsbeingderps Enabled 23
anime Enabled 64
animemes Enabled 82
antimeme Enabled 12
antiwork Enabled 102
asiansgonewild30plus NSFW Enabled 429
atbge Enabled 16
autism Enabled 14
bangmybully NSFW Enabled 152
battlestations Enabled 85
bbw NSFW Enabled 222
bbwbutthole NSFW Enabled 107
bbwhardcore NSFW Enabled 159
bbwtits NSFW Enabled 193
bdsm NSFW Enabled 336
bdsm_smiles NSFW Enabled 212
beautifulfemales Enabled 197
beefcurtains NSFW Enabled 194
bestofredditorupdates Enabled 132
bigareolalover NSFW Enabled 191
bigboobsgonewild NSFW Enabled 217
bigdickgirl NSFW Enabled 302
bigtittygothgf NSFW Enabled 448
blursedimages Enabled 41
bois NSFW Enabled 283
brasil Enabled 35
brownchickswhitedicks NSFW Enabled 352
bustynaturals NSFW Enabled 683
buttsandbarefeet NSFW Enabled 467
celebnsfw NSFW Enabled 366
celebnudedebut NSFW Enabled 169
celebs Enabled 148
celebswithpetitetits Enabled 419
chastitycouples NSFW Enabled 205
chubby NSFW Enabled 284
chubby_hentai NSFW Enabled 100
cirkeltrek Enabled 36
citiesskylines Enabled 34
citypop Enabled 14
combatfootage Enabled 43
comedyheaven Enabled 18
comics Enabled 56
completeanarchy Enabled 20
coolguides Enabled 131
couplesgonewildplus NSFW Enabled 152
creampie NSFW Enabled 319
cruelcaptions NSFW Enabled 61
cryptocurrency Enabled 23
cuckold NSFW Enabled 467
cuckoldstories2 NSFW Enabled 115
cumdumpsters NSFW Enabled 798
cumshotgifs NSFW Enabled 365
cumsluts NSFW Enabled 1170
curatedtumblr Enabled 16
cursedcomments Enabled 44
dankmemes Enabled 21
daresgonewild NSFW Enabled 391
datahoarder Enabled 52
dataisbeautiful Enabled 152
diwhy Enabled 43
documentaries Enabled 37
dross NSFW Enabled 44
emogirlsfuck NSFW Enabled 1045
engineeringporn Enabled 76
engorgedveinybreasts NSFW Enabled 336
entertainment Enabled 14
extramile NSFW Enabled 380
eyebleach Enabled 48
facepalm Enabled 25
factorio Enabled 45
fatwomenlove NSFW Enabled 88
fedora Enabled 65
femboy Enabled 103
femboymemes Enabled 39
femboys NSFW Enabled 336
femboys4real NSFW Enabled 148
feral_yiff NSFW Enabled 81
feralpokeporn NSFW Enabled 91
flatchests NSFW Enabled 366
food Enabled 12
fortyfivefiftyfive NSFW Enabled 527
fossdroid Enabled 126
framework Enabled 28
france Enabled 60
frogbutt NSFW Enabled 395
functionalprint Enabled 131
funny Enabled 168
furry_irl Enabled 128
futadomworld NSFW Enabled 107
fuzzypeeks NSFW Enabled 119
gamedeals Enabled 139
games Enabled 56
gay_irl Enabled 38
genshin_impact Enabled 15
gentlemanboners Enabled 300
gfur NSFW Enabled 157
ginger NSFW Enabled 332
girlsfinishingthejob NSFW Enabled 958
girlsjoy NSFW Enabled 347
girlsmasturbating NSFW Enabled 866
godot Enabled 17
godpussy NSFW Enabled 899
gonemildplus NSFW Enabled 47
gonewild NSFW Enabled 874
gonewild30plus NSFW Enabled 802
gonewildaudio NSFW Enabled 443
gonewildhairy NSFW Enabled 225
gonewildplus NSFW Enabled 126
gonewildstories NSFW Enabled 459
goodanimemes Enabled 66
gooned NSFW Enabled 621
grimdank Enabled 26
grool NSFW Enabled 369
guildwars2 Enabled 24
hackernews Enabled 42
hentai NSFW Enabled 541
hentaihumiliation NSFW Enabled 272
highstrangeness Enabled 54
hmmm Enabled 16
holup Enabled 48
homeassistant Enabled 272
homelab Enabled 146
humiliationcaptions NSFW Enabled 105
humongousaurustits NSFW Enabled 215
ich_iel Enabled 63
imthemaincharacter Enabled 35
incest_captions NSFW Enabled 398
incestsexstories NSFW Enabled 197
interestingasfuck NSFW Enabled 61
itookapicture Enabled 96
jav NSFW Enabled 419
javdreams NSFW Enabled 199
koreannsfw NSFW Enabled 444
kpopfap NSFW Enabled 96
labiadangling NSFW Enabled 185
ladyladyboners Enabled 159
leopardsatemyface Enabled 83
lesbians NSFW Enabled 543
lifeprotips Enabled 73
linustechtips Enabled 82
linux_gaming Enabled 134
livestreamfail Enabled 13
mademesmile Enabled 54
malelivingspace Enabled 37
manga Enabled 26
mapporn Enabled 148
me_irl Enabled 18
meirl Enabled 24
meme Enabled 35
memes Enabled 63
mildlyinteresting Enabled 136
movies Enabled 37
natalee NSFW Enabled 71
nativeamericangirls2 NSFW Enabled 128
newzealand Enabled 42
nosleep Enabled 80
nsfw_caption NSFW Enabled 454
nsfw_gif NSFW Enabled 665
nsfw_japan NSFW Enabled 218
nsfwcelebs NSFW Enabled 614
nsfwcosplay NSFW Enabled 598
nudecelebsonly NSFW Enabled 457
oilporn NSFW Enabled 259
okbuddyretard Enabled 39
onepunchman Enabled 29
onguardforthee NSFW Enabled 11
onmww NSFW Enabled 148
opensource Enabled 34
orgasms NSFW Enabled 467
pcmasterrace Enabled 306
piracy Enabled 134
plussizedhotwives2 NSFW Enabled 96
polandball Enabled 14
politicalcompassmemes Enabled 27
politicalhumor Enabled 11
preggohentai NSFW Enabled 127
presscumference NSFW Enabled 231
prettygirls Enabled 223
pronebone NSFW Enabled 279
publicfreakout Enabled 70
pussywallet NSFW Enabled 303
rance Enabled 22
rareinsults Enabled 55
retroussetits NSFW Enabled 512
riaesuicide NSFW Enabled 38
rule34 NSFW Enabled 900
sbcgaming Enabled 27
science Enabled 73
sdnsfw NSFW Enabled 499
selfhosted Enabled 172
sffpc Enabled 41
sfwredheads Enabled 120
shecame NSFW Enabled 204
shefuckshim NSFW Enabled 871
shitposting Enabled 15
shorthairchicks NSFW Enabled 406
shorthairedwaifus Enabled 66
simpsonsshitposting Enabled 82
singularity Enabled 36
sissyinspiration NSFW Enabled 211
slimthick NSFW Enabled 521
sluttyconfessions NSFW Enabled 395
smallboobs NSFW Enabled 759
sonicthehedgehog Enabled 23
space Enabled 55
spanking NSFW Enabled 87
squaredcircle Enabled 14
stablediffusion Enabled 44
steamdeals Enabled 135
steamdeck Enabled 164
strapon NSFW Enabled 256
stuffers NSFW Enabled 29
suctiondildos NSFW Enabled 186
taboocaptions NSFW Enabled 333
technicallythetruth Enabled 33
technology Enabled 109
television Enabled 36
tentai NSFW Enabled 253
tf_irl Enabled 18
therewasanattempt Enabled 28
theyknew Enabled 29
thick NSFW Enabled 380
thomastheplankengine Enabled 9
threesome NSFW Enabled 261
tifu Enabled 116
tiktokthots NSFW Enabled 320
todayilearned Enabled 501
transformation NSFW Enabled 85
traps NSFW Enabled 402
twinks NSFW Enabled 175
ufos Enabled 45
ukraine Enabled 49
ukrainewarvideoreport Enabled 19
ukrainianconflict Enabled 113
unbgbbiivchidctiicbg Enabled 52
undertoys NSFW Enabled 135
unixporn Enabled 52
upliftingnews Enabled 76
videos Enabled 12
watchitfortheplot NSFW Enabled 374
wetpussys NSFW Enabled 709
wholesomeyuri Enabled 75
widaczabory Enabled 2
woodworking Enabled 14
worldnews Enabled 293
wow Enabled 14
yiff NSFW Enabled 252
yuri NSFW Enabled 123
yuri_jp Enabled 22
zeldass NSFW Enabled 114
 

As discussed here, I have implemented a minimum level of upvotes that a post needs to have on reddit, as well as a minimum ratio of upvotes to downvotes.

Right now I have those configured to require at least 5 upvotes, and more upvotes than downvotes (0.51). At first glance this already seems to be great improvement. There might be some tweaking later.

As a side note I have now switched from using the reddit RSS feed, to using the JSON feed. This was required in order to get easy access to the upvote/ratio properties. So there might be some new and interesting new bugs introduced because of that. It's a brave new world.

Needless to say, the first thing I'll do after releasing this, is plop down on the couch with a beer, and hope this doesn't crash. Fingers crossed!

 

I'd like to hear some feedback on this, or approach vectors.

Right now the bot is rather spammy. I was hoping that by using Reddits HOT feed, it would return have some level of quality control (I know, right?). Unfortunately, it seems that in most cases, it will just return anything that's new. The downside of this is that a lot of garbage gets through, and the bot spends a lot of time scraping the underlying page to get the details.

I propose to only archive reddit posts that have a karma score of 5 or higher. In case of subs that hide the karma scores of posts for a certain time, they'd have to be at least 2 hours old, so that the Reddit moderators can weed out garbage on our behalf.

Do you folks have any thoughts on this?

Secondly, I want to put sticky comments on each community, with links to native Lemmy communities that cover the same subject. For this I would need some kind of API, or a master list of... oh, I see sub.rehab has just the thing I need. So expect that somewhere this week :).

1
submitted 2 years ago* (last edited 2 years ago) by admin@lemmit.online to c/about@lemmit.online
 

See you on the other side!


So the update is done, but the bot was offline for 6 hours, and needed to catch up.

Unfortunately, another update slipped through, which switched the default feed from www.reddit.com to old.reddit.com, which has the side effect of changing all the urls in the posts as well. On one hand this is great, because new reddit sucks. On the other hand, this is terrible, because for every post the bot encounters, it checks if it already exists on lemmit... based on the url.

So for every post the bot encountered, it went like "old.reddit.com/r/blabla/123? Haven't seen that one yet, there's an www.reddit.com/r/blabla/123, but that must be something completely different, let's post it again!"

This also meant that the bot took over a minute and a half to update each community because it takes a couple of second per post. When I went to bed last night, I figured it was just posting a lot of content because it had so much catching up to do. But this morning I figured something was off because it still hadn't caught up.

Anyway, the fix is out now. Sorry for all the duplicates. I need coffee now.

 

ChatGPT, write a post for the stuff that I have in my head and want to get out as an update.

Hmm. No brain implant yet. Guess I'll have to write this the hard way.

Syncing update

It has been an eventful week. I successfully deployed the initial version of smarter content syncing, and have made some adjustments to algorithm since then. Most notably, communities with only 1 subscriber (the bot) will no longer receive updates, and communities with fewer than 5 subscribers or with a low posting frequency will only be updated twice a day. Furthermore, for the highest update priority (every 10 minutes), a community must have a minimum of 50 subscribers. Implementation details can be found in the decide_interval() method over here.

Being a developer is fun

Meanwhile... Damnit, bot is stuck again.

2023-07-08 10:13:39,945 - utils.syncer - INFO - Scraping subreddit: bustynaturals. Last time  2:30:48 ago, interval 120 minutes
2023-07-08 10:13:40,653 - utils.syncer - INFO - 'latina bodies are the best' at https://www.reddit.com/r/BustyNaturals/comments/14twww8/latina_bodies_are_the_best/ updated: 2023-07-08 07:14:13+00:00
2023-07-08 10:13:45,324 - utils.syncer - ERROR - Error trying to retrieve post details, try again in a bit; Couldn't retrieve post detail page
2023-07-08 10:13:46,333 - utils.syncer - INFO - Scraping subreddit: bustynaturals. Last time  2:30:54 ago, interval 120 minutes
2023-07-08 10:13:48,581 - utils.syncer - INFO - 'latina bodies are the best' at https://www.reddit.com/r/BustyNaturals/comments/14twww8/latina_bodies_are_the_best/ updated: 2023-07-08 07:14:13+00:00
2023-07-08 10:13:51,227 - utils.syncer - ERROR - Error trying to retrieve post details, try again in a bit; Couldn't retrieve post detail page
...

1 bugfix and deployment later:

2023-07-08 10:46:42,836 - utils.syncer - INFO - Scraping subreddit: bustynaturals. Last time  3:03:51 ago, interval 120 minutes
2023-07-08 10:46:43,573 - utils.syncer - INFO - 'latina bodies are the best' at https://www.reddit.com/r/BustyNaturals/comments/14twww8/latina_bodies_are_the_best/ updated: 2023-07-08 07:14:13+00:00
2023-07-08 10:46:48,327 - utils.syncer - ERROR - Couldn't find post on https://old.reddit.com/r/BustyNaturals/comments/14told8/latina_bodies_are_the_best/, skipping.

Defederation

Meanwhile, the folks at https://lemmy.world/ reached out to me to tell me they're defederating Lemmit. They are not fond of high volume of posts made by the bot, and the fact that there are now (quick check) 462 communities on this server all being moderated by a single person. They have already received a couple of complaints about spam, and it didn't help that some requests for NSFW subreddits were not marked as NSFW. Occasionally, those subreddits had explicit thumbnails that appeared in the 'All feed' without warning.

I had a good talk with the LemmyWorld admin, wherein they explained their point of view, and I explained mine. I understand their decision to disassociate with Lemmit, and appreciate their attempt to contact me. Other instances like Beehaw, and some smaller ones have also reached the same decision.

This does mean that you will no longer be able to get new community updates on those servers. So make sure to check the blocked instances list on your home server if you were subscribed to Lemmit. At the same time I have removed all the subscriptions of users from those servers, in order to not affect the sync priority mentioned above. This does mean, that if LemmyWorld, Beehaw, etc ever decide to connect to Lemmit again (however unlikely), you will need to un- and re-subscribe from there.

Meanwhile, I've added a feature in the bot that will remove request posts for NSFW subreddits, if the post itself is not marked for NSFW. This should prevent explicit thumbnails showing up where they are not wanted.

Server growth

Last night I got an alert from my server monitoring that the disk is 80% full. Unfortunately, the disk is only 60 GB, so that doesn't leave much room for expansion. On the bright side, a good chunk of that is from Lemmys very verbose logging (like, 4 GB a day, which gets cleaned up daily), so it should last throughout the weekend if I tune that down. Furthermore, most of the storage growth is from from pictrs, the image upload part of Lemmy, and that can utilize an S3 bucket, rather than using the VM's storage like it is now. Using an S3 bucket offers a cost-efficient solution for expanding storage. Initial estimates indicate a monthly cost of around $5 for 1000 GB of storage, which should be sufficient for a while *fingers crossed*.

In the early days of Lemmit (literally, as the server is less than a month old) image uploads were limited to a default setting, which was something around 40 megabytes. That did add up quickly (thanks to half-minute porn gifs), and so I had to limit the max filesize to 1 MB, and later 0.5 MB. Once the server has switched to S3 storage, I can probably up that limit a little, although not too much.

Finally, Lemmy v0.18.1 has been released, and it contains even more performance boosts compared to v0.18.0, so if there's time left this weekend (and I can verify the Lemmit Bot is compatible), I will probably perform the upgrade.

1
submitted 2 years ago* (last edited 2 years ago) by admin@lemmit.online to c/about@lemmit.online
 

Okay, this one took me a bit longer than I planned (mostly due to sql fun and trying to use integers as minutes, WEEEE!).

Backdrop: Last week I disabled the mirroring of a couple of subreddits to the database, because they were initially requested but the nobody subscribed to them. At the same time, the bot was just crawling in a loop, starting at todayilearned, ending at latestsubreddit. As more subreddits were requested, this loop took longer and longer (21 minutes before I rolled out this update). This wasn't sustainable.

So here's the new situation. The more popular a community is, the more often it will be updated. In this case popular means a mixture between number of subscribers and the amount of posts it receives per day (Link to relevant snippet of source code).

In short, the most popular subs will be synced every 10 minutes, the next tier ever 30 minutes, 120 minutes and the content with either no posts per day or no subscribers (other than the bot), will only be synced every 12 hours. I hope this will hit a good distribution of updates vs popularity, but it will most likely be refined at some point in the future.

Speaking of distribution, we now have over 300 communities on this server 🥳, and their update intervals are spread out as such:

  • Every 10 minutes: 22
  • Every 30 minutes: 39
  • Every 60 minutes: 55
  • Every 120 minutes: 143
  • Every 720 minutes: 44

With this update running live (I started typing after I deployed it, and it has now gotten through the backlog of 'abandoned' subs), I'm going to step back from feature development for a few days. Any bugs that cause the bot to crash will of course continue to be addressed.

Have a blast!

view more: next ›