It is important to retain logs for a significant amount of time in order to be able to investigate and analyze past attacks. This allows security teams to identify patterns and trends that can aid in the detection and prevention of future attacks. The retention period will vary depending on the organization's specific requirements and regulations, but it is generally recommended to keep logs for at least one to two years. Additionally, having a robust log management system in place that allows for easy search and retrieval of logs is crucial for effective threat hunting.
Immediately accessing logs is one of the key weapons to hunting attackers. To analyze a sizable number of attacks like SolarWinds, one has to go back almost one or one and a half years. So, the first requirement for today's threat-hunting discipline is: The logs should be kept alive and searchable for years.
Due to the technical limitations of legacy SIEM and log management solutions against today's advanced attacks and threats, the legacy method of keeping logs alive for 15 to 90 days and then archiving them became ineffective in the face of new needs.
Vendors often fail to fully disclose just how much log storage can factor into running a legacy SIEM.
The guidance and instructions below also say to keep the logs alive for more than 180 days.
- Treasury Board of Canada Secretariat Event Logging Guidance
Secretariat Event Logging Guidance
- Executive Office Of The President Office Of Management And Budget- Improving the Federal Government’s Investigative and Remediation Capabilities Related to Cybersecurity Incidents
- European Union Agency for Cybersecurity (ENISA)- Cybersecurity Certification
Suggested minimum log retention times by Mitre.org .
Some use cases that require 180 to 360 days of hot logs:
- To investigate a data breach, can you go back 180 days from 23-10-2022 (Tuesday, April 26, 2022) and get the list of failed logins to the firewall?
- For example, whether your company has accessed avsvmcloud[.]com related to the SolarWinds hack in the last 6,12,18 months or not is a critical question. SolarWinds today, something else tomorrow.
- Which databases have the "SA" user logged into and what queries has it run in the last year?
Here, users expect products to offer alternative solutions to manage disk costs. Otherwise, if you are asked to give TBs or even PBs (for enterprise networks) of disk space, this won't work. The Leading SIEM solutions separate logs into hot, warm, and cold storage formats.
The general approach here is to keep hot, warm, and cold data on slow and cheap disks, starting with faster and more expensive disks, depending on their importance.
SureLog differs from its competitors. Like its rivals, it uses hot, warm, and cold storage techniques. And also, SureLog SIEM compresses the warm and cold data 40 to 80 times. This technique allows for keeping the warm log data on fast disks, which competitors have to keep on slow disks.
SureLog SIEM does not use Elasticsearch. It uses Vertical DB technologies in parallel or serial (depending on whether it is Hot or Warm). Due to-
• SureLog SIEM can keep warm data on slower and cheaper disks like its competitors.
• SureLog SIEM compresses the warm data 40 to 80 times in a fast disk, making it economical again.
• SureLog SIEM can keep warm data on slower and cheaper disks by compressing it 40-80 times.
When deploying SureLog SIEM, customers typically see between 50-75% savings on log storage compared to other SIEM vendors.
SureLog disk usage calculation always includes "Raw Line Storage".
There may be situations where you want to receive the raw line in its entirety, including the original format. SureLog supports this feature by default.
SureLog can use the slowest and most economical disks that can be used as cold data.
Log search is an important aspect of log analytics as it allows you to quickly and easily search through large amounts of log data to find specific events or patterns. However, log search alone is not enough for many use cases. Log analytics goes beyond just searching for specific events and patterns in log data. It also involves collecting, storing, and analyzing log data to gain insights and make decisions.
SELECT URL, SourceMachine, METHOD, REFERER, SUM(SENT) as total FROM taxonomy_object_firewalls GROUP BY URL, SourceMachine, METHOD, REFERER
It is necessary to run hundreds of concurrent analytical queries such as the following for both dashboard and analysis.
The analytical infrastructure of SureLog SIEM is quite robust, and SureLog SIEM offers fast log search capabilities.
The SureLog SIEM manages log ingestion, the log search, and running thousands of concurrent analytical queries with minimal system resources.
The fastest log search and hundreds of log analytics queries can both be performed with SureLog. Thanks for Vertical DB technologies and SureLog architecture
The fundamental SIEM correlation rules will change depending on the organization's specific scenario and threat environment. But some frequently used correlation rules include:
- Network intrusion detection: detection and notification of suspicious network activity, such as port scans or illegal access attempts.
- Malware detection: recognizing and warning of malware infection symptoms, such as file hashes or known dangerous IPs.
- Anomaly detection: Identifying and alerting on unusual or unexpected behavior, such as a user logging into the network from an unusual location or at an unusual time.
- Compliance monitoring: locating and alerting on activities that contravene regulatory compliance standards, such as unauthorized users accessing sensitive data.
- Whitelisting: Identifying and alerting on activity that deviates from a known good baseline, such as a user running an unauthorized application or connecting to an unauthorized device.
There are lots of use cases that are common for nearly all SIEM solutions [3,4,5,6,7,8,9]
Well-known vendors' rule lists are also published. IBM Qradar , Securonix, RSA Netwitness 
Well-known, standard or simple rules are not enough for today’s attacks. The majority of SIEM companies make use of 15 to 20-year-old correlation technologies. IBM Qradar , RSA Netwitness .
Although these correlation engines are incredibly robust and offer excellent defense against attacks, many more characteristics will give us a stronger advantage in this conflict.
Some features and rule formats of today’s SIEM solutions should support:
- Never seen before type of rules:
- Warn if a user does something they've never done before
- Long-term (maybe several months) activity monitoring rules:
- Warn if a user who has not had a VPN for at least 15 days (20,30,40…265 days) has remote interactive logon on more than one (1) workstation in a short time.
- No Activity for more than 60 Days: This account has not logged in for over 60 days.
- Password changes for the same user more than three times within 45 days.
- Periodicity monitoring for a long time (for weeks):
- Warn if a user has visited the malicious categories on the proxy at least once a day for a week. (Bot Networks, Uncategorized, Malware, Spyware, Dynamic DNS, Encrypted Upload)
- Rare detection:
- If there is port usage, which is very rare (like under %3).
- Statistical/Outlier based anomaly detection:
- Detect the ratio of login success versus failure per user anomaly.
- Monitors all the logins and access for nonworking hours. Checks the geolocation to find unusual behavior (never seen before).
- Time comparison type of rules (compare event occurred times):
- Warn if the time between two logins failed events of the same user is less than one minute.
- Predefined cyber security-specific detection algorithms:
- Mail Masquerade Detection: Warn if an e-mail was received from e-mail addresses similar to the original e-mail address like: firstname.lastname@example.org and ali.veli@citibαnk.com
- Masquerading Detection: Detect system utilities, tasks, and services Masquerading. (T1036.003 Rename System Utilities Rename, T1036.004 Masquerade Task or Service)
- Hunting malware and viruses by detecting random strings.
- Shannon Entropy Score Alert:
- Processes with Random Names.
- If the entropy of a file or directory is significantly higher than the baseline, it could indicate the presence of malware or unauthorized changes.
- Levenshtein Score Alert: Domains, file and process names, etc.
- Processes Matching or Similar to System Processes in Unexpected Directories
- Account Created with Name Similar to "Admin"
- Account Created with Name Similar to "Administrator"
- Account Created with Name Similar to the local service account naming convention
- Newly-Registered Domains Visited (requires WHOIS enrichment)
- Identifying Benign Websites: Top 1 Million Domains. If a domain has been created in the last 24 hours and this domain is in the top 1 million (Cisco Umbrella 1 million, https://majestic.com/reports/majestic-million, https://tranco-list.eu/, https://www.domcop.com/top-10-million-websites) list and not in our Whitelist, block it via NAC or Firewall.
- Rule as code: Develop correlation rules with java or python code
- Sigma Rules: Running sigma rules without converting is a big plus. All leading SIEM providers must convert Sigma rules to their query language before using them. Then they run those queries manually or schedule them. But the requirement is to run sigma rules in real-time and without conversion requirements.
SureLog SIEM supports all those kinds of correlation techniques.
Also, real-time correlation is an important feature to consider. For example, if a user is added to the domain admin group, we want to know it immediately.
In this regard, it is necessary to include the limitations of search-based and Cloud SIEM solutions regarding real-time correlations in risk calculation. [13,14,15,16,17].
For example, real-time queries are turned off by default in Splunk Cloud. Microsoft sentinel is also limited to 20 per company. On-prem Splunk blocks 1 CPU core for each real-time correlation.
Running the rules repeatedly to get around these constraints may result in additional system resources (costs) and performance issues if these costs are not paid.
SureLog SIEM supports real-time correlation.
Anomaly Detection and Machine Learning (ML)
User and Entity Behavior Analytics (UEBA), machine learning, and anomaly detection are a big plus.
Budgets are important when it comes to UEBA. In addition to UEBA technology, its economy is a major factor in deciding the features.
In addition to the high license prices, the costs associated with the necessary system requirements must also be included when estimating the price of UEBA products.
A manufacturer with a solid reputation in this area has provided a message on system requirements below.
There are several machine learning (ML) models that can be used for User and Entity Behavior Analytics (UEBA). Some commonly used models for UEBA include:
- Anomaly Detection: This type of model is used to identify abnormal behavior within a dataset by identifying patterns that deviate from the norm. Anomaly detection models can be based on unsupervised learning techniques such as clustering or density-based methods, or supervised techniques such as decision trees or random forests.
- Random Forest: Random Forest is an ensemble method that builds multiple decision trees and combines their predictions to improve the overall accuracy of the model. Random Forest is a versatile method that can be used for both classification and regression problems and can be applied to UEBA problems.
- Recurrent Neural Networks (RNNs): RNNs are a type of neural network that are well suited for sequential data, such as time-series data. RNNs can be used to model the behavior of entities over time and can be used to identify patterns or anomalies in the data.
- Deep Learning: Deep learning models such as deep neural networks (DNNs) and convolutional neural networks (CNNs) can be used to analyze and classify large amounts of data, such as log files or network traffic. These models can be used to detect patterns and anomalies in the data that might indicate a security threat.
- Time Series Models: Time Series models are used to model the data over time. These models are useful in UEBA to model the behavior of entities over time, such as the number of logins, the number of files created, etc.
Sample use cases:
- Unusual access to servers
- A suspicious command that deletes shadow copies has been executed for process vssadmin.exe
- Insider Threat Detection: UEBA can identify an employee who is accessing sensitive files outside of their normal job responsibilities, or who is sending large amounts of data outside of the organization.
- Account Compromise Detection: UEBA can identify an account that has been accessed from multiple locations or devices at unusual times, or that has been used to access sensitive data or systems that the user does not normally access.
- Advanced Persistent Threat Detection: UEBA can identify a system that is communicating with a known APT command and control server, or that is exhibiting unusual behavior such as downloading large amounts of data or encrypting files.
- Cloud Security: UEBA can identify unusual access to cloud resources, or detect and alert on suspicious activities such as data exfiltration, privilege escalation, or lateral movement.
- A user logs in remotely at 3 a.m. (usually only doing so locally during normal business hours), then makes repeated attempts to connect to a production database as an administrator.
- Robotic Pattern Observed - Failed Logins
- Unexpected System Restart
- HPA Misuse
- Privilege Escalation Detection: UEBA can identify a user who is attempting to access privileged accounts or systems without authorization, or who is attempting to use a privileged account to access sensitive data or systems.
- HPA Misuse Detection: UEBA can identify a privileged user who is accessing sensitive data or systems outside of their normal job responsibilities, or who is using privileged access to perform unauthorized actions.
- HPA Misconfiguration Detection: UEBA can identify a privileged user who is sharing their account credentials with others, or who is using privileged accounts in an insecure way.
- Suspicious HPA Activity Detection: UEBA can identify a privileged user who logs in from unusual locations or at unusual times, or who uses privileged accounts to access sensitive data or systems that they do not normally access.
- Typo-squatting (misspelled domains)
- Phishing Detection: UEBA can identify an email that contains a link to a misspelled domain that is similar to a legitimate domain, which might indicate a phishing attempt.
- Malicious Domain Detection: UEBA can be used to detect and respond to malicious domains by analyzing data such as DNS logs, network traffic, and user activity. For example, UEBA can identify a domain that is similar to a legitimate domain but that is being used to host malware or phishing content, which might indicate a malicious domain.
- Credential Phishing Detection: UEBA can identify a login attempt to a misspelled domain that is similar to a legitimate domain, which might indicate a credential phishing attempt.
- Spear-phishing Detection: UEBA can identify an email that contains a link to a misspelled domain that is similar to a legitimate domain and that is targeted to a specific individual or group, which might indicate a spear-phishing attempt.
- Vishing Detection: UEBA can identify a call that is made to a phone number that is similar to a legitimate phone number but that is being used for phishing or scamming purposes, which might indicate a vishing attempt.
- Data Exfiltration Detection: UEBA can identify a user who is copying large amounts of data to an external device or cloud storage account, or who is emailing sensitive data to a personal email account.
- File Access Anomaly Detection: UEBA can identify a user who accesses sensitive files outside of their normal job responsibilities or who accesses files that they haven't accessed before.
- Unusual File Modification Detection: UEBA can identify a user who modifies sensitive files outside of their normal job responsibilities or who modifies files that they haven't modified before.
- File Encryption Detection: UEBA can identify a user who is encrypting large numbers of files or who is encrypting sensitive files, which might indicate a security incident.
- File Deletion Detection: UEBA can identify a user who is deleting large numbers of files or who is deleting sensitive files, which might indicate a security incident.
Exabeam, Securonix, Gurucul, Splunk, Microsoft Sentinel, SureLog SIEM [20,21,22] and many other utilizes various machine-learning techniques.