Threat Intelligence with AI: The Power of Gemini, Drive and Inoreader Integration

Leveraging Gemini with a custom dataset for advanced threat analysis and correlation

Bank Security
9 min readDec 18, 2023

Key Takeaways

  • The integration of Google Gemini AI with Inoreader and Google Drive is a revolutionary approach to enhancing Cyber Threat Intelligence analysis;
  • The method involves creating a private dataset of CTI-relevant information from various sources, such as RSS feeds, Telegram channels, and Google Dorks;
  • This dataset is then used with Gemini to obtain more accurate and context-rich insights compared to traditional internet-connected AI platforms;
  • This integration is highly cost-effective and scalable, with the ability to add and analyze content in multiple languages.
  • Future enhancements could include integrating private CTI feeds and SIEM for even more tailored and organization-specific insights.

Introduction

I recently tried to find a simple and effective AI solution that could answer questions from various Cyber Threat Intelligence analysts during their daily activities. I did various tests with different platforms including OpenAI dedicated GPTs, ChatGPT 4 connected to the internet or Gemini. None of the tested solutions provide precise, referenced answers when it comes to investigating a specific threat actor, their TTPs, or particular IOCs.

All tested AI solutions connected to the internet struggle to correlate the mass of public information and often suffer from “hallucinations”, giving sometimes inconsistent and wrong answers, with the risk to lead the CTI analysts down the wrong path and making them lose the thread of the investigation. Clearly it depends on the type of question and how much you want to have a precise answer but since in the work of CTI analysts, evidence and facts are a crucial point for starting with assessments and eventual hypotheses, errors should not be allowed.

After several tests, I discovered that the most effective approach for obtaining coherent, accurate, and well-referenced responses involves the use of predefined datasets. Instead of relying on search results from Bing or Google, this method involves analyzing information from a specific collection of documents. Thanks to the limited and clean quantity of data to be analyzed, LLMs are able to correlate the available information more quickly and precisely.

So, what’s the most cost-effective AI option for integrating your private dataset with a pre-trained LLM?

The integration of Gemini with Inoreader platform passing through Google Drive.

Following you can find the basic configuration overview:

Basic process overview

Dataset creation

To develop a continuously updated and consistent Cyber Threat Intelligence dataset with near-real-time information, it’s essential to begin by gathering data from all publicly available sources. This includes blog posts, Google Dorks, Telegram Channels, and more. These sources are full of relevant info and collecting them along with associated IOCs and TTPs related to the different Threat Actors can be incredibly valuable for a CTI Analyst.

Let’s start!

Gathering the latest articles and blog posts from different cybersecurity companies can be efficiently done using an RSS feed reader. Previously, I wrote about Inoreader, which remains a viable choice today for basic threat intelligence monitoring and OSINT activities. Inoreader also proves to be an ideal tool for this specific AI use case.

Inoreader configuration

Begin by setting up an account on Inoreader: https://www.inoreader.com/ After creating your account, proceed to add Cyber Security blogs or import them using an OPML file:

Cyber Security Blogs example

Connect your Google account to Inoreader:

Ensure to enable the option for automatically fetching full content of new articles. Often, RSS readers display only previews or titles. By activating this feature, you can guarantee that the complete article content is saved to Google Drive, minimizing the risk of missing out on relevant information.

Automation

Now, you need to set up a custom rule in Inoreader to automatically transfer all new articles directly into Google Drive:

Rule creation on Inoreader platform

Thanks to this rule, each time a new article is published, a document file containing the full content will be automatically saved in your Google Drive folder:

Example of doc files related to the new published articles
Doc content example

Gemini Power

Now, all you need to do is access Gemini and use the default Google plugins integrated into the platform to query your information dataset. If plugins are not enabled in your country use this URL:

https://gemini.google.com/app?hl=en

To activate the plugin, you need to use the “@” symbol followed by “Google Drive” in your command.

Google Drive plugin

Google will access the documents saved in your Google Drive and will correlate all the data contained within them:

Gemini gains access to Google Workspace

Subsequently, it will display correlated results exclusively based on the articles saved in Google Drive, thus providing curated, precise and referenced information.
Following a generic question about the DarkGate malware:

DarkGate Malware example

It’s interesting to observe that the validity of all sentences can be validated due to the proper references listed at the end of the response. Following the references related to the DarkGate example showed above:

DarkGate References example

The larger the dataset you have, the more correlations Gemini can establish, thereby providing a more comprehensive and detailed answer. Moreover, since the dataset is clean and based to the CTI articles, the response is always precise and contextualized.

Now, it’s all about enjoying the process and exploring the possibilities…

Amazing Results

IOC context

This integration offers the capability to fully contextualize an IOC by leveraging diverse data sources.

SHA-256 context example

IP context example:

IP example

Typically, such an activity could take minutes or hours, depending on the CTI analyst’s ability, skills, seniority and experience to provide a precise response.

You can obtain the full context around an IOC within just 5 seconds. Amazing isn’t it?

TTPs listing

You can also obtain a list of TTPs used by a specific Threat Actor, and map them to the MITRE ATT&CK.

SysJoker TTPs question:

TTPs list example

Mapping the listed TTPs with the MITRE ATT&CK Framework:

In this scenario, I chose not to use the Google Drive plugin since it involves a specific list of TTPs that can be easily verified using the online MITRE framework.

Threat Hunting

For a threat hunter, this Gemini configuration is highly beneficial as it enables the creation of a list of command lines used by Threat Actors. This list can then be used to create dedicated hunting rules for search these specific threats within your infrastructure.

PowerShell Command Lines example:

PS cmd lines

Microsoft Office 365 Advanced Threat Protection (ATP) Advanced Hunting rules creation:

MS ATP rules

Malware / Threat Actors comparison or similarities

You can see an example of how to correlate the information from two info-stealers and identify the differences between them:

DarkGate and Lumma example

Here, you’ll find a request to correlate data in order to identify the threat actor that most closely resembles Arid Viper in characteristics and behavior based on the available dataset:

This feature is extremely valuable for identifying Threat Actors who operate in similar contexts and might share the same techniques. Such a response typically involves correlating extensive information, a task that CTI analysts could take days to complete.

Additional Use Cases

Telegram integration

Inoreader allows you to add Telegram Channels to your feed list. This enables the aggregation of text messages from various Telegram channels, which Gemini can then easily correlate for comprehensive analysis:

Mysterious Bangladesh example

Automatic translation

Gemini supports 26 languages. This allows you to also add articles from non-English blogs to your feeds. Following an example about the F.A.C.C.T. Russian feed:

Russian Feed Example

Saved on Google Drive in Russian:

Google Doc written in Russian

And correctly interpreted, translated and referenced by Gemini:

Gemini question on russian article

Overcome the limits imposed by the admins of the LLM platform

At times, specific queries may encounter restrictions set by the administrators of the LLM platform. Following an example:

Limited response

Utilizing a private dataset can help bypass these limitations.

No limitations

Highly scalable

Over the past month, I’ve uploaded hundreds of articles, each averaging about 4kb in size. Considering Google Drive’s free 15GB storage capacity, it will be quite a while before this limit is reached. Additionally, I haven’t observed any decrease in performance as time goes on and the volume of documents for analysis grows. This suggests that the solution is currently highly scalable.

$5 per month

One important aspect to note is that Inoreader reserves the functionality for creating rules, adding Telegram channels, monitor specific keywords and other advanced features exclusively for premium account holders. They frequently offer discounts, and after the initial 14 free trial days, you can assess whether the described solution justifies the subscription cost, which is approximately $5 per month (the sole expense for the entire configuration: Inoreader + Drive + Gemini).

Possible Next Steps

Private Feed / Corporate data integration

The potential next steps in this process could involve integrating private Cyber Threat Intelligence (CTI) feeds, such as internal incident logs, internal MISP, OpenCTI, and feeds from private CTI intelligence providers. This integration would enhance the private dataset stored in Google Drive, leading to more accurate and organization-specific responses from Gemini. Additionally, this could assist Gemini in identifying threats specifically targeting your organization, based on your internal data:

Process evolution

Scheduled Queries & SIEM integration

A potential future development of this integration / project could involve incorporating the results of scheduled queries directly into a SIEM system. This approach would enable the CTI team to monitor specific keywords or threat actors continuously. The queries could be executed daily or weekly, depending on the need, with the results displayed on a SIEM dashboard for easy monitoring and analysis.

Overall process

Conclusion

The integration of Google Gemini with Inoreader and Google Drive presents a revolutionary approach to enhancing Cyber Threat Intelligence analysis. By creating a dedicated dataset from various sources including RSS feeds, Telegram Channels, and Google Dorks, and then using this dataset with Google Gemini, CTI analysts can obtain more accurate, context-rich insights.

This method overcomes the limitations of traditional AI platforms, which often struggle to accurately correlate vast public information and can lead to misleading ‘hallucinations’. The predefined dataset approach minimizes these issues, as it allows the AI to work with a cleaner, more relevant data pool. As a result, the answers provided are more coherent, accurate, and well-referenced, which is critical in CTI where evidence and facts are paramount.

Moreover, this integration is highly cost-effective. With Inoreader’s advanced features available at a reasonable subscription cost and Google Drive’s substantial free storage capacity, it offers a scalable solution for CTI analysis without a hefty investment. The ability to add and analyze content in multiple languages further enhances its effectiveness.

This method is not perfect and is still prone to errors. There are certainly many things that can be improved but it is an excellent starting point for those who want to approach the world of Threat Intelligence with the help of AI.

Future enhancements could include integrating private CTI feeds and SIEM, which would provide even more tailored and organization-specific insights. This could enable more precise threat detection, particularly for threats directly targeting an organization.

--

--