How to Train Generative AI Using Your Company’s Data .. and Your Security

Artificial intelligence can be a powerful asset however, its real value doesn’t come out of the box. It requires massive tuning and monitoring outputs to ensure you don’t get “hallucinations” meaning random completely wrong answers. You also need to protect what it learns from or you could corrupt its value. Microsoft learned that within one day of letting a AI bot learn from the Internet. The Internet can be a awful place and turned that AI bot into an racist A hole (see THIS for more info on that story).

The Harvard Business Review broke down creating a valuable AI asset into three categories (article found HERE). The first is building your own, which requires a TON of data and assets meaning its out of range for many organizations. They gave an example of BloombergGPT built on 40 years of data. The second approach is tuning an existing LLM, which also requires a lot of data but not as much as building your own. The more common approach being used today is tuning a prompt based AI.

Any approach can work but come with the risk of data compromise and governance challenges. For example, if your SOC in the future uses a XDR AI Chat bot to analyze events, that system must be trained and monitored. I’ve seen powerful demos from Microsoft’s security copilot showing a SOC analyst asking “what is this event” and the security chatbot give a very detailed breakdown within seconds about what was asked about. This is powerful however, attackers will see this as a weakness and could inject corrupt data to teach this system to fail. It’s a similar concept to a SIEM living on the concept of time. If an attacker changes the clock of a SIEM, all events change causing a complete meltdown of that solution’s value.

The Harvard article closes with advise on how to include generative AI capabilities into a company’s tasks. This recommend these behaviors to include using training or policies.

Knowledge of what types of content are available through the system;
How to create effective prompts;
What types of prompts and dialogues are allowed, and which ones are not;
How to request additional knowledge content to be added to the system;
How to use the system’s responses in dealing with customers and partners;
How to create new content in a useful and effective manner.

I would add a few more

How to identify inaccurate results and how often are they occurring.
How to protect existing knowledge from potential corruption.
How to identity the level of impact from systems against used content.
How to create point in time success go back points. This is a failsafe in the event the system goes out of control and becomes difficult to return to a useful state.

I’m sure there are many other considerations however, it’s important to first understand how important TRAINING is when considering AI as a resource. Once again, that Harvard article can be found HERE.

Related posts:

Leave a Reply Cancel reply