678
Ford had to hire back former engineers to fix mistakes made by its automated systems
(www.theverge.com)
We're not The Onion! Not affiliated with them in any way! Not operated by them in any way! All the news here is real!
Posts must be:
Please also avoid duplicates.
Comments and post content must abide by the server rules for Lemmy.world and generally abstain from trollish, bigoted, ableist, or otherwise disruptive behavior that makes this community less fun for everyone.
And that’s basically it!
It's obnoxious enough to try to use for myself, sometimes useful but obnoxious to review the code and just constant screwups except for exceedingly boilerplate stuff or stuff that can take some sloppiness (e.g. LLM can make it easy to indicate some variables to get from argv and do the tedium of that plus help text plus man page edits and generally do that fine). Even if it doesn't screw up obviously, if the code is verbose, I know a screw up is lurking and just ditch it and do it myself.
However, the real pain comes in as other people use it. Just today someone had an issue and normally they'd ask a developer for help and offer debug appropriate information and/or access. However, they "just had Claude do it, even used Opus 4.8 to make sure it's good" and it generated a very verbose report on the issue, why it went wrong, and the appropriate change to make it work. Very detailed and the explanation sounded quite reasonable. Problem was that it was horribly and absolutely wrong, a fiction of a rationalization over a bad code change. It made a change that happened to appear to work for him, but in reality it replaced a failure due to unrecognized data to silent corruption of the data in a facet the user specifically did not care about. Claude claimed it was correctly mapping the unrecognized data correctly, but it just made up a completely untethered conversion based on nothing. Now I could tell the explanation and code change was bullshit at a glance, but it became an argument because the user wouldn't give me actionable debug details because "he already had Claude fix it". I had to keep trying to find holes in the Claude rationalization that the user would also recognize, and he sided with Claude four times until the fifth problem in Claude's explanation finally stuck (it asserted that the problem was due to running a specific outdated version of a specific software, problem being that specific version never even existed, and the minimum "good" version was 10 years old and the version the user was running was about a month old).
I don't understand how people get this far and still don't understand that AI is much better at sounding plausible than being correct.