Dran_Arcana

joined 2 years ago
[–] Dran_Arcana@lemmy.world 19 points 1 week ago (3 children)

Yes they were, so I'm offering you an actual theory as to why this may actually be true, yet difficult to "prove".

Smoking was bad for your health long before anyone sat down and took the time to prove it. Autoregressive LLM tokenizer are a very new field of computer science and it's going to take a while for the community to collectively understand everything we're currently doing by trial and error.

[–] Dran_Arcana@lemmy.world 36 points 1 week ago (7 children)

Anecdotally, I use it a lot and I feel like my responses are better when I'm polite. I have a couple of theories as to why.

  1. More tokens in the context window of your question, and a clear separator between ideas in a conversation make it easier for the inference tokenizer to recognize disparate ideas.

  2. Higher quality datasets contain american boomer/millennial notions of "politeness" and when responses are structured in kind, they're more likely to contain tokens from those higher quality datasets.

I haven't mathematically proven any of this within the llama.cpp tokenizer, but I strongly suspect that I could at least prove a correlation between polite token input and dataset representation output tokens

[–] Dran_Arcana@lemmy.world 8 points 3 weeks ago

Thank you for letting me know what software not to use; good bot

[–] Dran_Arcana@lemmy.world 1 points 3 weeks ago (2 children)

Crossfading and normalization would both independently be dealbreakers for me. I can't go back

[–] Dran_Arcana@lemmy.world 1 points 3 weeks ago

I would be genuinely surprised if fair use draws the line on format-shifted, legally purchased media, at "remote watch-together", leaving format-shifting and local watch-together in-tact.

If it were up to the studio's interpretation of the law, you'd need to purchase a license for each person during local watch-together.

[–] Dran_Arcana@lemmy.world 14 points 3 weeks ago (9 children)

agree in principal, but in practice:

  1. parents who live across the state

  2. plexamp for music

[–] Dran_Arcana@lemmy.world 12 points 3 weeks ago

They are indeed just that keen on our data.

They know they can't get rid of it for all of their customers, but they do want to make it as hard as possible for random users to do so.

[–] Dran_Arcana@lemmy.world 9 points 3 weeks ago* (last edited 3 weeks ago) (1 children)

The problem with this is it doesn't work for home users that want to pay for their software. Crazy... I know... but those people do exist.

[–] Dran_Arcana@lemmy.world 2 points 3 weeks ago (1 children)

For people with "that one game" there is a middle ground. Mine is Destiny 2 and they use a version of easy anticheat that refuses to run on Linux. My solution was to buy a $150 used Dell on eBay, a $180 GPU to be able to output to my 4 high-res displays, and install Debian + moonlight on it. I moved my gaming PC downstairs and a combination of wake-on-lan + sunshine means that I can game at functionally native performance, streaming from the basement. In my setup, windows only exists to play games on.

The added bonus here is now I can also stream games to my phone, or other ~thin clients~ in the house, saving me upgrade costs if I want to play something in the living room or upstairs. All you need is the bare minimum for native-framerate, native-res decoding, which you can find in just about anything made in the last 5-10 years.

[–] Dran_Arcana@lemmy.world 24 points 1 month ago (2 children)

"Open source" in ML is a really bad description for what it is. "Free binary with a bit of metadata" would be more accurate. The code used to create deepseek is not open source, nor is the training datasets. 99% of "open source" models are this way. The only interesting part of the open sourcing is the architecture used to run the models, as it lends a lot of insight into the training process, and allows for derivatives via post-training

[–] Dran_Arcana@lemmy.world 1 points 7 months ago

Fail2ban and containers can be tricky, because under the hood, you'll often have container policies automatically inserting themselves above host policies in iptables. The docker documentation has a good write-up on how to solve it for their implementation

https://docs.docker.com/engine/network/packet-filtering-firewalls/

For your usecase specifically: If you're using VMs only, you could run it within any VM that is exposing traffic, but for containers you'll have to run fail2ban on the host itself. I'm not sure how LXC handles this, but I assume it's probably similar to docker.

The simplest solution would be to just put something between your hypervisor and the Internet physically (a raspberry-pi-based firewall, etc)

view more: next ›