Secure generative AI for the enterprise requires the right permissions structure

Jason Rudianto

Engineering

To get the most out of great enterprise search and generative AI solutions, employees need to be able to trust them. For that to happen, it’s essential that the information being input into the model strictly follows the permissioning rules set for each and every document. Without that guarantee, workers have no idea if confidential documents will stay confidential. Users can lose confidence in the platform, hurting its usability and slowing adoption of potentially transformative new technologies. 

Properly built permissioning ensures that generative AI systems only access and utilize data that they’re explicitly allowed to, regardless of app or user. For instance, Glean's AI assistant is fully permissions-aware and personalized, only sourcing information the user has explicit access to. This not only maintains the security and confidentiality of sensitive data but also enhances the relevance and personalization of the AI's output.

If you’re looking to build or integrate generative AI for work, a robust permission awareness framework needs to be at the top of the list when it comes to ensuring that it’s ready for business environments. In this blog, I’ll be sharing a little more about why and how permissioning needs to be prioritized, along with how we structure our own enterprise-ready solution. 

Comprehensive yet nuanced

A great generative AI solution needs to have access to all your content and activity – it’s the only way to ensure results are always relevant, recent, and nuanced enough to deliver exactly what you need for each query. However, that poses a challenge when it comes to permissioning and privacy, as strict data access rules need to be followed across all levels in order to ensure sensitive information is kept safe.

Adhering to data access rules across multiple levels maintains security

For example, from the minute Glean is deployed, permissioning and data access rules are a key priority. Glean works under the principle of least privilege, which is when users are provided only the minimum level of access that they’re authorized to have. This enables our system to strictly enforce data access rules that exist at multiple levels, across all applications. 

Concurrency and scalability

Another serious challenge is keeping things up to date, at all times, regardless of scale. Permissions need live updates in order to ensure data security – permission rules change often, and it’s essential that they’re reflected as soon as possible. However, factors such as API rate limits can hinder information capture and processing across all types of documents.

Untracked permission updates can result in serious security issues

Similar to crawler development, API issues and flakes also present a considerable problem when dealing with permissioning information. Permissioning structures need to be flexible and resilient enough to deal with these complications, yet robust enough to understand how each document should be displayed to each user. 

Additionally, computing and storing document permissions need to be performant and storage-efficient. While it’s tempting to individually mark up user permissions document by document, it’s extremely costly to index and document loading performance suffers due to increased storage needs. 

{{richtext-banner-component}}

Unified permissions model 

Instead, it’s better to build a framework that’s compatible with all datasources – capable of understanding the different permission architectures for each source. However, datasource permissions can be as basic as model document permissions (as just a list of users) or as complex as nested groups, user-role based, and other unique properties. 

Each datasource’s permissions framework is also unique, requiring investigation and reverse engineering to discover how the datasource implements document visibility. Permissioning behavior is also often not well documented or comprehensive enough to cover every possible solution, making this even more challenging. 

Datasources can also have unique behaviors that need to be handled in order to mirror the search functionality of that respective datasource. Some edge case examples to consider include:

  • Documents that are made publicly visible only when a user has visited that document before. An example of this includes certain GDrive permissions.
  • Temporary access permissions. Some datasource permissions support assigning permissions for a limited time, so in order to mirror this behavior, systems need to be kept well in sync. 
  • Documents that don’t provide flattened lists of allowed users or groups. These documents only specify general “allowed” criterias and “disallowed” criterias, which need to be resolved manually. 

Developing a system that accounts for all the above can be an extensive endeavor requiring considerable engineering resources and efforts. Here at Glean, it initially took us years to tune our solution to ably account for these complexities and edge cases, for corpuses of any scale. 

Good permissioning gets AI enterprise-ready

Establishing a properly built permissioning system for any generative AI or search solution is essential to ensuring that it's enterprise ready, but it’s not easy to build. Spending the time to fine-tune secure permissioning for generative AI solutions may result in delayed time to value, difficulties with internal alignment, along with hefty costs down the line when it comes to developing, maintaining, and training it on your own. 

Enterprise data is the most valuable resource for any organization, and understanding how best to leverage it – and protect it – will be ever more important as workers onboard new tools into complex digital work environments. If you’re interested in getting started with truly enterprise-ready generative AI today with an out-of-the-box permissioning solution, sign up for a Glean demo!

Related articles

No items found.

The generative AI shift: How CIOs will determine the future of business success