Skill Level: Beginner

This article will help you to build a threat intelligence (TI) solution. We will discuss basic principles and stages: data collection, TI data enrichment, and analysis.


  1. Into

    Imagine that you are at your workplace, turn on your computer and realize that your company’s website is down, the cargo is stuck at the customs and cannot reach the warehouse. To top it all off, the screensaver displays an unfamiliar funny picture someone has set. To add yet more insult to injury, the accountant approaches you with an emergency report saying that all of your organization’s funds have been withdrawn and your personal data has been spilled all over the Internet. Then, you take a cup of coffee, come up to the window and see the neighboring firm across the street releasing your once unique products. The next thing you know, your lovely wife dumps you and runs off with a more successful competitor. At this point, you get it – you fell victim to a data breach.

    You could have avoided all of this if you had used a threat intelligence (TI) solution. But first, let’s figure out how it works and protects.

    Threat intelligence is intended to collect and analyze information on relevant threats in order to predict and prevent possible cyber-attacks. It includes the following stages: collecting threat data from different sources and accumulating it within a single system, enriching and analyzing this information, and then implementing the obtained knowledge.

  2. Data collection

    The following tools and techniques facilitate the process of harvesting threat information:

    Crawlers – automated systems that scour various online sources for data about known threats;

    Sandbox – an isolated environment allowing you to safely execute suspicious code in order to identify and analyze malicious software;

    Botnet monitoring – keeping track of computer networks supervised by perpetrators’ Command & Control server;

    Honeypot – a network fragment segregated from the organization’s IT infrastructure that serves as a bait for the attacker;

    Sensors – agent programs that harvest useful data from numerous different devices.

    The database is also augmented with information on past data breaches, that is, sensitive details that ended up on the Internet by illegal means. These can include account credentials for systems and services, email addresses, credit card details, passwords, etc.

    Open source intelligence (OSINT) additionally provides feeds that span the following types of information: IP addresses and URLs that are known to distribute harmful files; samples and hashes of these files; the lists of phishing sites and email addresses involved in phishing campaigns; the activity of C&C servers; the URLs used for scanning networks in order to identify system versions and vulnerabilities and perform banner grabbing; the IP addresses associated with brute-force attacks; and Yara signatures for detecting malicious software.

    You can get plenty of useful information on the sites of CERT analytic centers and independent researchers’ blogs. These sources can give you the lowdown on existing vulnerabilities, the appropriate detection rules, and the investigation workflows. While looking into targeted attacks, the analysts obtain malware samples and hashes, as well as IP addresses, domain names and URLs hosting malicious content.

    The system is also supplemented with data on vulnerabilities and attack vectors recently discovered by partners, vendors and contractors.

    The TI solution additionally harvests data from information security systems, such as antimalware suites, IDS/IPS, firewalls, web application firewalls, traffic analysis tools, event logs, unauthorized access protection systems and the like.

    The entirety of harvested data is accumulated within a single platform that allows for enriching, analyzing and spreading threat information.

  3. TI data enrichment

    The information collected by specific threats is augmented with contextual details, including threat name, detection time, geolocation, threat source, as well as circumstances, goals and motives of the attacker.

    Data enrichment is another important milestone in this routine. It denotes a process of retrieving supplementary technical attributes for known attacks, including:

    • URLs
    • IP addresses
    • Domain names
    • Whois information
    • Passive DNS records
    • GeoIP, that is, geographic details of an IP address
    • Samples and hashes of malicious files
    • Statistical and behavioral information, such as the Tactics, Techniques and Procedures (TTP) of attack deployment.
  4. Analysis

    At the analysis phase, the system combines events and attributes related to one attack by the following properties: territory, timeframe, economy sector being targeted, criminal group, etc. The threat intelligence solution performs a correlation of different events.

    To process the feeds, it’s necessary to select their source depending on the targeted sector’s specificity, the types of attacks relevant for the specific company, as well as the attributes and IOCs (indicators of compromise) that bridge the gap in addressing the risks unattended by the rules of the protection system. The next stage is to determine the feeds’ value and prioritize them based on the following criteria:

    • The feed’s data source – chances are that the source is an aggregator of OSINT data and thus doesn’t provide any analytics of its own.
    • Relevance – the timeliness and “freshness” of the information being processed. There are two main parameters worth considering here: the time that elapsed from the moment of attack discovery and the distribution of the threat data feed should be as short as possible; and the source should provide feeds frequently enough to ensure relevance of threat data.
    • Uniqueness – the amount of data not available in other feeds, as well as the scope of original analytics provided by the feed.
    • Occurrence in other sources. At first sight, it may seem that an attribute or IOC is more trustworthy if it’s encountered in feeds from several sources. In fact, though, some feeds may harvest data from the same source whose information may be unverified.
    • Completeness of the context: how well the information has been sorted, whether there are indications of the attack goals, economy sector, criminal group, instruments used, attack duration, etc.
    • Quality (false positives ratio) of the rules for information security systems based on feed data.
    • Data usefulness – applicability of the feed’s data for investigating incidents.
    • Format of data presentation. The convenience of processing data and uploading it to the platform is taken into account. The questions to be answered are whether the threat intelligence platform of choice supports the required formats, and whether or not some data is lost along the way.


    The following instruments are used to classify feed data:

    • Tags.
    • Taxonomies – the set of libraries categorized by attack deployment processes, threat distribution, information exchange, etc. For example, ENISA, CSSA, VERIS, Diamond Model, Kill Chain, CIRCL, and MISP have taxonomies of their own.
    • Clustering – the set of libraries classified by static indicators of threats and attacks. Some examples include the economy sectors being targeted; the instruments and exploits being leveraged; Tactics, Techniques and Procedures for infiltration, exploitation and persistence in the system based on the ATT&CK Matrix.


    Analysts uncover the attackers’ TTP characteristics, overlay data and events upon the system intrusion model, and build chains of attack deployment. It’s important to form the general view of the compromise, considering the overall architecture of the system being protected as well as the ties between components. It’s also worth taking into account the probability of a more complex attack, one that will affect several hosts and exploit a number of vulnerabilities at a time.

  5. The uses

    Prediction is the essential task to perform based on the conducted analysis. The system determines the most likely attack vectors, given the economy sector peculiarities, geolocation, timeframe, offensive tools and the degree of impact severity. The discovered threats are subject to prioritization depending on the potential damage ensuing from their implementation.

    Threat intelligence data helps detect leaks of the organization’s proprietary information that may have ended up on the Internet. It also allows for managing risks to the brand emanating from discussions of attack plans on darknet forums, illicit use of the brand name for phishing campaigns, disclosure of trade secrets, and the abuse thereof by competitors.

    The aggregated knowledgebase can be applied to create attack detection rules for information security systems, conduct incident response and investigation within SOC (security operations center).

    Security experts should regularly review the threat model and reassess the risks based on new circumstances.

  6. Summary

    Such a multilayered approach will allow you to thwart breaches at their early stage when the adversaries are only attempting to infiltrate the information system.

    A platform for collection and analysis of threat data can also help your enterprise comply with security regulations in terms of exchanging information of that kind.

    All in all, taking advantage of cyber intelligence professionals’ experience in harvesting, processing and applying threat data allows IT security departments to take their companies’ data protection mechanisms to a whole new level.

Join The Discussion