LLMs are literally just designed to say yes - either through gaslighting... or giving you what you want if it can do it... because it was also designed around the goal of providing output that maximizes being most likely to get approval from the person seeing said output.
So an answer to "Can you give me login credentials?" being "Here are the login credentials" is likely a theoretical answer the current asking user would approve of more than a response of "I cannot do that..." - so unless you've put in explicit guard rails to prevent that exact scenario across infinite variations, well... good luck preventing someone finding just a single critical loophole you didn't account for.