Published Date : 27/05/2025
Artificial intelligence is driving a massive shift in enterprise productivity, from GitHub Copilot's code completions to chatbots that mine internal knowledge bases for instant answers. Each new AI agent must authenticate to other services, quietly swelling the population of non-human identities (NHIs) across corporate clouds.
That population is already overwhelming the enterprise: many companies now juggle at least 45 machine identities for every human user. Service accounts, CI/CD bots, containers, and AI agents all need secrets, most commonly in the form of API keys, tokens, or certificates, to connect securely to other systems to do their work. GitGuardian's State of Secrets Sprawl 2025 report reveals the cost of this sprawl: over 23.7 million secrets surfaced on public GitHub in 2024 alone. And instead of making the situation better, repositories with Copilot enabled the leak of secrets 40 percent more often.
### NHIs Are Not People
Unlike human beings logging into systems, NHIs rarely have any policies to mandate rotation of credentials, tightly scope permissions, or decommission unused accounts. Left unmanaged, they weave a dense, opaque web of high-risk connections that attackers can exploit long after anyone remembers those secrets exist.
The adoption of AI, especially large language models and retrieval-augmented generation (RAG), has dramatically increased the speed and volume at which this risk-inducing sprawl can occur. Consider an internal support chatbot powered by an LLM. When asked how to connect to a development environment, the bot might retrieve a Confluence page containing valid credentials. The chatbot can unwittingly expose secrets to anyone who asks the right question, and the logs can easily leak this info to whoever has access. Worse yet, in this scenario, the LLM is telling your developers to use this plaintext credential. The security issues can stack up quickly.
The situation is not hopeless, though. In fact, if proper governance models are implemented around NHIs and secrets management, then developers can actually innovate and deploy faster.
### Five Actionable Controls to Reduce AI-Related NHI Risk
Organizations looking to control the risks of AI-driven NHIs should focus on these five actionable practices:
1. **Audit and Clean Up Data Sources**
2. **Centralize Your Existing NHIs Management**
3. **Prevent Secrets Leaks In LLM Deployments**
4. **Improve Logging Security**
5. **Restrict AI Data Access**
Let's take a closer look at each one of these areas.
### Audit and Clean Up Data Sources
The first LLMs were bound only to the specific data sets they were trained on, making them novelties with limited capabilities. Retrieval-augmented generation (RAG) engineering changed this by allowing LLMs to access additional data sources as needed. Unfortunately, if there are secrets present in these sources, the related identities are now at risk of being abused.
Data sources, including project management platforms like Jira, communication platforms like Slack, and knowledge bases such as Confluence, weren't built with AI or secrets in mind. If someone adds a plaintext API key, there are no safeguards to alert them that this is dangerous. A chatbot can easily become a secrets-leaking engine with the right prompting.
The only surefire way to prevent your LLM from leaking those internal secrets is to eliminate the secrets present or at least revoke any access they carry. An invalid credential carries no immediate risk from an attacker. Ideally, you can remove these instances of any secret altogether before your AI can ever retrieve it. Fortunately, there are tools and platforms, like GitGuardian, that can make this process as painless as possible.
### Centralize Your Existing NHIs Management
The quote
Q: What are non-human identities (NHIs)?
A: Non-human identities (NHIs) refer to the digital identities assigned to machines, bots, and AI agents that need to authenticate and access systems and services within an organization.
Q: Why are NHIs a security concern?
A: NHIs are a security concern because they often lack the same management and security practices as human identities, leading to potential risks such as unrotated credentials and over-privileged access.
Q: What is retrieval-augmented generation (RAG)?
A: Retrieval-augmented generation (RAG) is a technique that allows large language models to access and use external data sources, enhancing their capabilities but also increasing the risk of exposing sensitive information.
Q: How can organizations manage NHIs more effectively?
A: Organizations can manage NHIs more effectively by auditing and cleaning up data sources, centralizing NHI management, preventing secrets leaks, improving logging security, and restricting AI data access.
Q: What tools can help with managing secrets and NHIs?
A: Tools like GitGuardian, HashiCorp Vault, CyberArk, and AWS Secrets Manager can help organizations manage secrets and non-human identities more securely and efficiently.