No Stupid Questions
No such thing. Ask away!
!nostupidquestions is a community dedicated to being helpful and answering each others' questions on various topics.
The rules for posting and commenting, besides the rules defined here for lemmy.world, are as follows:
Rules (interactive)
Rule 1- All posts must be legitimate questions. All post titles must include a question.
All posts must be legitimate questions, and all post titles must include a question. Questions that are joke or trolling questions, memes, song lyrics as title, etc. are not allowed here. See Rule 6 for all exceptions.
Rule 2- Your question subject cannot be illegal or NSFW material.
Your question subject cannot be illegal or NSFW material. You will be warned first, banned second.
Rule 3- Do not seek mental, medical and professional help here.
Do not seek mental, medical and professional help here. Breaking this rule will not get you or your post removed, but it will put you at risk, and possibly in danger.
Rule 4- No self promotion or upvote-farming of any kind.
That's it.
Rule 5- No baiting or sealioning or promoting an agenda.
Questions which, instead of being of an innocuous nature, are specifically intended (based on reports and in the opinion of our crack moderation team) to bait users into ideological wars on charged political topics will be removed and the authors warned - or banned - depending on severity.
Rule 6- Regarding META posts and joke questions.
Provided it is about the community itself, you may post non-question posts using the [META] tag on your post title.
On fridays, you are allowed to post meme and troll questions, on the condition that it's in text format only, and conforms with our other rules. These posts MUST include the [NSQ Friday] tag in their title.
If you post a serious question on friday and are looking only for legitimate answers, then please include the [Serious] tag on your post. Irrelevant replies will then be removed by moderators.
Rule 7- You can't intentionally annoy, mock, or harass other members.
If you intentionally annoy, mock, harass, or discriminate against any individual member, you will be removed.
Likewise, if you are a member, sympathiser or a resemblant of a movement that is known to largely hate, mock, discriminate against, and/or want to take lives of a group of people, and you were provably vocal about your hate, then you will be banned on sight.
Rule 8- All comments should try to stay relevant to their parent content.
Rule 9- Reposts from other platforms are not allowed.
Let everyone have their own content.
Rule 10- Majority of bots aren't allowed to participate here. This includes using AI responses and summaries.
Credits
Our breathtaking icon was bestowed upon us by @Cevilia!
The greatest banner of all time: by @TheOneWithTheHair!
view the rest of the comments
As someone who is completely blind, I pay for OpenRouter in order to have AI describe images to me. If more people bothered with alt text, I wouldn't have to. But it is what it is. I suspect there are models I could run locally that would do what I need; on IOS, apple handles all image descriptions locally on the phone, and they're perfectly adequate. But on Windows, nobody has created an easy way to get a local model running in the Open-source NVDA screen reader (https://www.nvaccess.org/) but there are multiple addons that work with OpenRouter. NVDA is open source and entirely written in Python, so it should actually be pretty easy to do. The main reason I haven't tried it myself is because I have no idea what local model to use. None of the benchmarks really tell me "This model would be good at describing images to blind people". Whereas the giant cloud models are semi-okay at everything, so everyone just uses those. But if we could use a smaller model, we might even be able to fine tune it for the specific use-case of blind people. Maybe someday!
I recommend Google's gemma-3-4b-it-qat in LLM Studio.
on a i7 13650hx RTX4060 laptop.
Image link: https://images.squarespace-cdn.com/content/v1/58586fa5ebbd1a60e7d76d3e/1493824812986-X5XRILWEPX6FAD5I19ZU/image-asset.jpeg?format=2500w
The scene is just a normal sized turtle with a house built ontop of the shell, there is a bird sitting on a tree stump. The AI didn't do the best job BUT it is only 4b params, it can run on phones or really fast on a gpu. Naturally Google with their Youtube comments on videos data they do have the best computer vision models.
I will also include Grok for comparison. It's the model I use for images that I think might contain adult content (pornography, violence, whatever) because it won't refuse anything. Sadly, because I'm boring, I'm using it more for moderation than pleasure. Unfortunately, I have a prompt for it designed to limit refusals, and I forgot to change it. As you can see, that...strongly effected the description. Suddenly there are tits in this picture.
Thanks! It's a bit wordy, but I suspect that could be improved with prompting and some examples.
For comparison, here's the full google 2.5 pro in the cloud:
How's the usage number and how much does it cost? Always thought that this is literally the best thing that AI is actively doing.
It really depends. For images that are graphs and infographics I use gpt5 or Gemini 2.5 pro. For anything with adult content I have to use grok because it’s the only model that won’t refuse. For stuff that’s just text in an image the cheap models from Microsoft are fine. Also, sometimes openrouter has limited time deals where some models are free. I’d say overall I spend between 2 and 5 dollars a month on it. But I do allow open router to train on the data so I get a discount of a few percent as well.
Did you have to get somebody to set your gear up for you or can you somehow do it all yourself?
These days I can do it all myself. Press control+windows+enter when Windows first boots, and the basic built-in screen reader that's part of Windows 11 comes on. It's good enough to get through set-up and install a better screen reader. Sadly, if I were on Linux, that wouldn't at all be the case. Though I do run multiple Linux servers via SSH, including all of the infrastructure for rblind.com.
I did manage to assemble my DIY Framework 16 laptop, and I'll upgrade the mainboard in it later this year, but that's pretty much hitting my limits when it comes to hardware. Soldering is right out, and Oh My God do I hate those damn ipex connectors.
OpenRouter is pay per token so cost depends on usage and what model is being used.