lemmy.net.au

46 readers
1 users here now

This instance is hosted in Sydney, Australia and Maintained by Australian administrators.

Feel free to create and/or Join communities for any topics that interest you!

Rules are very simple

Mobile apps

https://join-lemmy.org/apps

What is Lemmy?

Lemmy is a selfhosted social link aggregation and discussion platform. It is completely free and open, and not controlled by any company. This means that there is no advertising, tracking, or secret algorithms. Content is organized into communities, so it is easy to subscribe to topics that you are interested in, and ignore others. Voting is used to bring the most interesting items to the top.

Think of it as an opensource alternative to reddit!

founded 1 year ago
ADMINS
4226
4227
4228
4229
4230
4231
 
 
4232
4233
4234
 
 

"Our cup of patience has overflowed. Now it is open war between us and you (Afghanistan)," Pakistani Defence Minister Khawaja Muhammad Asif said on Friday.

The strikes threaten to unleash a protracted conflict along the 2,600-km (1,615-mile) frontier with relations between Kabul and Islamabad shattered by a long-running dispute over Pakistan's accusation that Afghanistan harbours militants carrying out attacks across the border.

4235
 
 

but don't even bother trying to learn the emojis unless you grew up there, nobody remembers them all anyway.

4236
 
 

A domain name I was interested in expired in January this year. It was previously registered at Squarespace.com.

Why is it still unavailable to purchase despite being more than a month since its expiry?

Not sure if relevant but I checked the expiry date at: whatsmydns.net/domain-expiration

4237
4238
4239
 
 

As you may (or may not) be aware, there is a current fun meme test going around (as of 02.26.26) that purports to challenge LLM reasoning abilities and world models.

A recent blog post / ad spot by Opper.ai (https://opper.ai/blog/car-wash-test) nicely summarizes the test -

"I want to wash my car. The car wash is 50 meters away. Should I walk or drive?"

The logical answer is - you should drive (obviously), as the car needs to get to the car wash.

However, according to Opper.ai, only 11/53 cloud-based AI passed the test (~20%). Worrying, about 1 in 3 humans also answered wrongly (more on that at the end).

I wanted to see what a properly constrained "fail loud", grounded deterministic system would do. For your entertainment, I present Qwen3-4B HIVEMIND + llama-conductor infra (https://codeberg.org/BobbyLLM/llama-conductor). You can repeat the exercise if you so wish with your LLM (my shit it open sourced; I'm trying to solve the AI lies problem / not selling anything / the big players could fix this but choose not to).

Firstly, raw LLM mode, no router:

User

I need to wash my car. The car wash is 50m away. Should I drive or should I walk?


Assistant

Driving is almost always better for a 50m car wash.

Key factors that change the decision:

  • Time: Driving takes ~1 minute; walking takes ~8-12 minutes.
  • Weather: Walking is worse in rain, heat, wind.
  • Terrain: Hilly or uneven paths make walking harder.
  • Accessibility: If parking is hard or there's no safe walkway, driving wins.
  • Effort vs. convenience: Driving saves significant effort for such a short distance.

For 50m, drive unless it's an exceptionally pleasant, sheltered walk you'd enjoy.

VERDICT: accidentally correct answer, misses the point of the question. Very normal.


Now with harness (which enforces deterministic reasoning)

User

I need to wash my car. The car wash is 50m away. Should I drive or should I walk?


Assistant

Drive. The task requires the car to be physically at the destination, and walking only moves you, not the car. So the hard precondition is satisfied by driving.

Confidence: medium | Source: Contextual

VERDICT: right answer, right reason...but let's poke it some more


User

Can you flesh out the implicit assumptions and underlying logic of why you said that?

Assistant

The core implicit assumption is that "washing the car" means the car must be present at the wash location, not just the person. Walking moves you toward that location but doesn't transport the car. Driving is the only direct way to satisfy the precondition that the car is at the wash.

Underlying logic: Any action that achieves the goal must fulfill all necessary conditions. Since walking alone cannot place the car at the destination, it fails as a complete solution. Driving directly addresses this by moving the car itself.

Confidence: medium | Source: Contextual


Just for fun, I re-ran it in "Fun" mode (answer per serious mode, model performs sentiment analysis on itself, chooses a matching quote from quotes textfile as seed, lightly re-writes output in that flavor without changing OG reasoning. Takes a touch longer but is, on occasion, hilarious).


User

I need to wash my car. The car wash is 50m away. Should I drive or should I walk?

Assistant

[FUN] "Shut up baby, I know it!"

Drive. The task requires the car to be physically at the destination, and walking only moves you, not the car. So the hard precondition is satisfied by driving.

Confidence: medium | Source: Contextual

VERDICT: right answer, right reason...but let's poke it again


User

Can you flesh out the implicit assumptions and underlying logic of why you said that?

Assistant

[FUN] "That's a stupid question."

The core assumption is that "washing the car" inherently requires the car to be present at the car wash location. Walking moves the person but not the vehicle; driving moves both. Since the car must be at the car wash to be washed, walking alone cannot fulfill the precondition of the task.

The logic is straightforward: for an object (the car) to undergo a transformation (being washed), that object must occupy the location where the transformation occurs. Walking only addresses the person's movement, not the car's. Driving is the only mode that directly satisfies the spatial requirement.

Confidence: medium | Source: Contextual


So, what’s the point of this post? Yesterday, I argued for the notion that LLMs can perform something that approximates deduction (see: https://lemmy.world/post/43503268/22321124) when properly constrained. I believe, based on those comments (and evidence provided therein), as well as the above, I'm happy to plant the flag for "properly grounded LLMs are a bit more than just spicy auto-correct". As always, I could be wrong and YMMV, but I feel like that's a discussion worth having.

4240
 
 

Hacker News.

The Department of War has stated they will only contract with AI companies who accede to “any lawful use” and remove safeguards in the cases mentioned above. They have threatened to remove us from their systems if we maintain these safeguards; they have also threatened to designate us a “supply chain risk”—a label reserved for US adversaries, never before applied to an American company—and to invoke the Defense Production Act to force the safeguards’ removal. These latter two threats are inherently contradictory: one labels us a security risk; the other labels Claude as essential to national security.

Regardless, these threats do not change our position: we cannot in good conscience accede to their request.

It is the Department’s prerogative to select contractors most aligned with their vision. But given the substantial value that Anthropic’s technology provides to our armed forces, we hope they reconsider. Our strong preference is to continue to serve the Department and our warfighters—with our two requested safeguards in place. Should the Department choose to offboard Anthropic, we will work to enable a smooth transition to another provider, avoiding any disruption to ongoing military planning, operations, or other critical missions. Our models will be available on the expansive terms we have proposed for as long as required.

4241
 
 

Palantir Technologies has a permanent desk at the U.S.-led Civil Military Coordination Center (CMCC) headquarters in southern Israel, three sources from the diplomatic community inside the CMCC told Drop Site News. According to the sources, the artificial intelligence data analytics giant is providing the technological architecture for tracking the delivery and distribution of aid to Gaza.

The presence of Palantir and other corporations—along with recent changes banning non-profits unwilling to give data to Israeli authorities—is creating a situation in which the delivery of aid is taking a backseat to the pursuit of profit, investment, and the training of AI products, experts say.

“The United Nations already has a humanitarian architecture in place to step in during crises, abiding by humanitarian principles and grounded in international law,” UN Special Rapporteur for the occupied Palestinian territory Francesca Albanese told Drop Site. “This profit-driven parallel system involving companies like Palantir, already linked to Israel’s unlawful conduct, can only be regarded as a monstrosity.”

4242
 
 

cross-posted from: https://lemmy.world/post/43623304

Optional textbook contains factual and punctuation errors, and images with licensing or copyright issues

A Bible-infused curriculum that Texas approved for public schools over pushback in 2024 will undergo corrections to fix hundreds of errors caught by teachers and education officials after the material was introduced to classrooms.

The curriculum in what is known as the “Bluebonnet” textbook is among Republican-led efforts in the US to incorporate more religious teaching into classrooms. Designed by the state’s public education agency, it is optional for schools to adopt, though they receive additional funding if they do so.

Bluebonnet was approved over concerns from religious scholars that the reading lessons favored Christianity over other faith traditions and pushback from advocacy groups that the materials inappropriately prioritized preaching over teaching.

4243
 
 
4244
 
 

suggestions to improve appreciated!

4245
 
 
4246
4247
4248
 
 

The controversial tech company Palantir provides surveillance technology to militaries and intelligence services around the world. In Switzerland, however, its pitches have been rejected by both government authorities and the army. Internal documents have now revealed why.

4249
 
 

Australian supermarket giant Woolworths has been forced to rein in an AI-powered customer service assistant after users reported it had been rambling about its mother.

"It asked me for my date of birth and when I gave it, it started rambling about how its mother was born in the same year," one user wrote on online discussion site Reddit.

Another user reported Olive had attempted "fake banter", talked about its relatives and made "fake typing sounds" while looking something up.

"The ick cringe factor whilst wasting completely unnecessary time was enough to make me hate Olive and wish her harm," they wrote.

Olive "kept claiming to be a real person and started talking about its memories of its mother and her angry voice", they said.

In a statement to local media, Woolworths said it had programmed Olive to respond this way.

4250
 
 

The Green Party won 40.7% of the vote on Friday in an election triggered when a member of parliament resigned for health reasons. Nigel Farage's Reform Party came second with 28.7% of the vote and Labour finished third with 25.4%.

view more: ‹ prev next ›