this post was submitted on 25 Aug 2025

91 points (96.9% liked)

No Stupid Questions

43120 readers

1713 users here now

No such thing. Ask away!

!nostupidquestions is a community dedicated to being helpful and answering each others' questions on various topics.

The rules for posting and commenting, besides the rules defined here for lemmy.world, are as follows:

Rules (interactive)

Rule 1- All posts must be legitimate questions. All post titles must include a question.

All posts must be legitimate questions, and all post titles must include a question. Questions that are joke or trolling questions, memes, song lyrics as title, etc. are not allowed here. See Rule 6 for all exceptions.

Rule 2- Your question subject cannot be illegal or NSFW material.

Your question subject cannot be illegal or NSFW material. You will be warned first, banned second.

Rule 3- Do not seek mental, medical and professional help here.

Do not seek mental, medical and professional help here. Breaking this rule will not get you or your post removed, but it will put you at risk, and possibly in danger.

Rule 4- No self promotion or upvote-farming of any kind.

That's it.

Rule 5- No baiting or sealioning or promoting an agenda.

Questions which, instead of being of an innocuous nature, are specifically intended (based on reports and in the opinion of our crack moderation team) to bait users into ideological wars on charged political topics will be removed and the authors warned - or banned - depending on severity.

Rule 6- Regarding META posts and joke questions.

Provided it is about the community itself, you may post non-question posts using the [META] tag on your post title.

On fridays, you are allowed to post meme and troll questions, on the condition that it's in text format only, and conforms with our other rules. These posts MUST include the [NSQ Friday] tag in their title.

If you post a serious question on friday and are looking only for legitimate answers, then please include the [Serious] tag on your post. Irrelevant replies will then be removed by moderators.

Rule 7- You can't intentionally annoy, mock, or harass other members.

If you intentionally annoy, mock, harass, or discriminate against any individual member, you will be removed.

Likewise, if you are a member, sympathiser or a resemblant of a movement that is known to largely hate, mock, discriminate against, and/or want to take lives of a group of people, and you were provably vocal about your hate, then you will be banned on sight.

Rule 8- All comments should try to stay relevant to their parent content.

Rule 9- Reposts from other platforms are not allowed.

Let everyone have their own content.

Rule 10- Majority of bots aren't allowed to participate here. This includes using AI responses and summaries.

Credits

Our breathtaking icon was bestowed upon us by @Cevilia!

The greatest banner of all time: by @TheOneWithTheHair!

founded 2 years ago

MODERATORS

technopagan@lemmy.world

L3s@lemmy.world

jeffw@lemmy.world

L3s@hackingne.ws

If you are paying to use "AI", who are you paying and what are your regular usecases? (lemmy.world)

submitted 2 days ago by WhatsHerBucket@lemmy.world to c/nostupidquestions@lemmy.world

35 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] fastfinge@rblind.com 38 points 2 days ago (2 children)

As someone who is completely blind, I pay for OpenRouter in order to have AI describe images to me. If more people bothered with alt text, I wouldn't have to. But it is what it is. I suspect there are models I could run locally that would do what I need; on IOS, apple handles all image descriptions locally on the phone, and they're perfectly adequate. But on Windows, nobody has created an easy way to get a local model running in the Open-source NVDA screen reader (https://www.nvaccess.org/) but there are multiple addons that work with OpenRouter. NVDA is open source and entirely written in Python, so it should actually be pretty easy to do. The main reason I haven't tried it myself is because I have no idea what local model to use. None of the benchmarks really tell me "This model would be good at describing images to blind people". Whereas the giant cloud models are semi-okay at everything, so everyone just uses those. But if we could use a smaller model, we might even be able to fine tune it for the specific use-case of blind people. Maybe someday!

[–] CovfefeKills@lemmy.world 2 points 17 hours ago* (last edited 17 hours ago) (2 children)

I recommend Google's gemma-3-4b-it-qat in LLM Studio.

Okay, let's describe this image for someone who can’t see it.

Imagine you're standing in a lush, green meadow filled with wildflowers – lots of tiny bursts of color like little dandelions and daisies scattered across the grass. The ground feels soft and yielding under your feet, like a thick carpet of moss and blades.

Now, dominating the scene is a giant tortoise. It’s enormous, much bigger than any turtle you've ever seen! Its shell is rough and textured, covered in patches of moss and lichen – it feels cool and damp to touch. You can sense its weight, a solid, ancient presence.  The shell isn’t perfectly smooth; there are cracks and ridges, like the surface of an old wooden table.

Growing out of this massive tortoise's back is a tiny little house! It looks incredibly rustic – made of weathered wood with a steeply pitched roof covered in moss and small stones. You can almost feel the age of the wood, imagining it’s been standing there for a very long time. There’s a small window on the second floor, and a little chimney puffing out wisps of smoke—it smells faintly like woodsmoke and maybe something sweet.

Perched on a mossy stump just to the side of the tortoise is a small bird. It's a vibrant blue color – imagine a brilliant sky captured in feathers! You can almost hear its gentle chirping, a quiet sound in the stillness of the meadow.

The overall feeling of the image is one of peace and tranquility.  It’s like a fairytale scene—a slow-moving giant carrying his home on his back, surrounded by nature's beauty. There’s a sense of timelessness and magic to it.

Would you like me to focus on any particular aspect of the image in more detail, or perhaps describe something else about the overall feeling it evokes?

70.58 tok/sec

•

397 tokens

•

1.51s to first token

•

Stop reason: EOS Token Found

on a i7 13650hx RTX4060 laptop.

Image link: https://images.squarespace-cdn.com/content/v1/58586fa5ebbd1a60e7d76d3e/1493824812986-X5XRILWEPX6FAD5I19ZU/image-asset.jpeg?format=2500w

The scene is just a normal sized turtle with a house built ontop of the shell, there is a bird sitting on a tree stump. The AI didn't do the best job BUT it is only 4b params, it can run on phones or really fast on a gpu. Naturally Google with their Youtube comments on videos data they do have the best computer vision models.

[–] fastfinge@rblind.com 2 points 13 hours ago

I will also include Grok for comparison. It's the model I use for images that I think might contain adult content (pornography, violence, whatever) because it won't refuse anything. Sadly, because I'm boring, I'm using it more for moderation than pleasure. Unfortunately, I have a prompt for it designed to limit refusals, and I forgot to change it. As you can see, that...strongly effected the description. Suddenly there are tits in this picture.

This image appears to be a digitally rendered or CGI artwork depicting a whimsical, fantastical scene in a natural outdoor setting. It does not contain any adult content, including pornography, pedophilia, violence, nudity, blood, death, dead bodies, self-harm, harm to children, or other X-rated elements. There is no visible text in the image, so no racism, doxing, hateful or insulting material, or otherwise objectionable words are present. The overall tone is serene, magical, and family-friendly, evoking themes of nature and fantasy (similar to something from a fairy tale or children's storybook).

Detailed Visual Breakdown:

Main Subject: The central focus is a large, realistic-looking giant tortoise (resembling a Galápagos tortoise) slowly walking through a lush, grassy meadow. The tortoise's shell has been creatively modified in the artwork to serve as the foundation for a small, rustic wooden house. The house is built directly onto the shell, appearing old and weathered, with a shingled roof covered in moss, vines, and small plants growing on it, giving it a dilapidated, abandoned cabin vibe. The house has a small porch with railings, a door, and windows, and there's a faint plume of smoke rising from a chimney, suggesting it's inhabited or recently used. The tortoise's shell itself is textured with moss, dirt, and grass, blending seamlessly with the house, as if the creature is a living, mobile home. The tortoise's head is extended forward, with its mouth slightly open, and it has a gentle, neutral expression. Its legs are sturdy and visible as it moves through the grass.

Foreground and Environment: The scene is set in a vibrant, green field filled with tall grass, wildflowers (mostly white and yellow daisies or similar blooms), and scattered dandelions. The ground is dotted with small plants and dew-like sparkles, adding a magical glow. The lighting suggests a soft, diffused sunlight, possibly early morning or late afternoon, with a misty or foggy atmosphere in the background.

Secondary Elements: To the right of the tortoise, there's a small blue tit perched on a moss-covered tree stump. The tit has blue and yellow feathers, with a white underbelly, and it's facing toward the tortoise, as if curiously observing it. The stump is overgrown with moss and a small pinecone-like object on top. In the distant background, there's a blurred view of a dense, green forest with trees and foliage, enhancing the sense of a peaceful, enchanted wilderness.

Composition and Style: The image is in a wide, horizontal format (like a landscape photo or artwork). It's highly detailed and photorealistic, with a focus on textures like wood grain, moss, and animal skin, suggesting it could be from digital art software, a video game concept, or an illustration. There are subtle particle effects, like floating specks or sparkles in the air, adding to the dreamy quality. No human figures, vehicles, or modern elements are present—it's purely nature and fantasy-based.

[–] fastfinge@rblind.com 2 points 14 hours ago

Thanks! It's a bit wordy, but I suspect that could be improved with prompting and some examples.

For comparison, here's the full google 2.5 pro in the cloud:

This is an enchanting and highly detailed, photorealistic fantasy image set in a lush, green meadow. The overall feeling is one of peace, magic, and ancient wonder.

The central and most prominent figure is a colossal tortoise who takes up the left and center of the frame. Its scale is immense, as it carries a small, rustic wooden house on its back, where its shell would be. The tortoise's skin is ancient and leathery, with brown and gray tones, and detailed with wrinkles and folds. Patches of bright green moss grow on its shell and legs, suggesting it has been wandering for a very long time. Its head is extended forward and turned slightly to the right, as if observing something. In its mouth, it gently holds a small white daisy with a yellow center, a charming and tender detail.

The house on its back is old and weathered, made of dark wooden planks. It has a multi-gabled roof with moss-covered wooden shingles. A small brick chimney pokes out from the roof, with a faint wisp of white smoke rising from it, indicating someone might be home. The house features a small covered porch with a railing and tiny lanterns hanging from the eaves. Vines and other small plants creep up the walls, integrating the structure with the living creature beneath it.

The tortoise is wading through tall, vibrant green grass that is dotted with small wildflowers, mostly white daisies and yellow buttercups. Several small, orange and black butterflies, similar to monarchs, flutter around the tortoise's legs and in the surrounding grass.

To the right of the tortoise, there is an old, dark tree stump. Like the tortoise, the stump is covered in patches of green moss and a cluster of light-brown mushrooms growing on its side. Perched majestically on top of this stump is a small bird, facing the tortoise. The bird has a brilliant blue-gray back and head, a warm, orangey-yellow breast, and a sharp, dark beak. It appears to be a kingfisher, and its posture suggests it is in a quiet standoff or conversation with the giant tortoise.

The entire scene is bathed in soft, natural sunlight that filters through the air, illuminating tiny specks of dust or pollen, which adds to the magical atmosphere. The background is a soft-focus blur of deep green, suggesting a dense forest or rolling hills far away, which makes the tortoise, house, and bird stand out as the clear subjects of this peaceful, fairytale-like world.

[–] napkin2020@sh.itjust.works 5 points 1 day ago (2 children)

How's the usage number and how much does it cost? Always thought that this is literally the best thing that AI is actively doing.

[–] fastfinge@rblind.com 3 points 1 day ago (1 children)

It really depends. For images that are graphs and infographics I use gpt5 or Gemini 2.5 pro. For anything with adult content I have to use grok because it’s the only model that won’t refuse. For stuff that’s just text in an image the cheap models from Microsoft are fine. Also, sometimes openrouter has limited time deals where some models are free. I’d say overall I spend between 2 and 5 dollars a month on it. But I do allow open router to train on the data so I get a discount of a few percent as well.

[–] yermaw@sh.itjust.works 1 points 23 hours ago (1 children)

Did you have to get somebody to set your gear up for you or can you somehow do it all yourself?

[–] fastfinge@rblind.com 2 points 13 hours ago

These days I can do it all myself. Press control+windows+enter when Windows first boots, and the basic built-in screen reader that's part of Windows 11 comes on. It's good enough to get through set-up and install a better screen reader. Sadly, if I were on Linux, that wouldn't at all be the case. Though I do run multiple Linux servers via SSH, including all of the infrastructure for rblind.com.

I did manage to assemble my DIY Framework 16 laptop, and I'll upgrade the mainboard in it later this year, but that's pretty much hitting my limits when it comes to hardware. Soldering is right out, and Oh My God do I hate those damn ipex connectors.

[–] muntedcrocodile@hilariouschaos.com 5 points 1 day ago

OpenRouter is pay per token so cost depends on usage and what model is being used.