What is T.A.S.E.R?
Triage & Automated Security Event Response, or T.A.S.E.R for short, is a collection of open source tools and custom tools that are combined together to create a modular and unified security case management and automation framework. One of the big features of T.A.S.E.R is that is is all modular and customizable system when you can go from using one tool to using something else in a very active tech environment.
Each part of T.A.S.E.R below is going to get its own dedicated blog post, however this is more of an overview of the concept of T.A.S.E.R
Parts of T.A.S.E.R
- API Gateway
- Terraform (deployment)
- CircleCI (deployment)
T.A.S.E.R by Part
One of the problems with security tools is the fact that every tool sends alerts in different formats.. How do you then get those alerts in your SIEM? Often times this is what locks you into your current SIEM.
AlertEngine solves this issue. Think of AlertEngine as your translator and collector of alerts. Alerts can be gathered in multiple different ways: Webhook, Cron Based, Email, S3, Splunk, etc. and pulls out all the relevant data from the alert and makes it into an AlertEngine Alert Object and sends it to an S3 bucket. The entry in the S3 bucket then triggers AlertForwarder that takes the alert object and can send it to your SIEM or tool of your choice like TheHive, Jira, Your current SIEM.
So how does AlertEngine really work?
AlertEngine is more of a concept than it is a tool. In short AlertEngine is more or less a collection of AWS Lambda functions.. To be exact, one Lambda function per alert source. Depending on the alert source, the lambda function is either triggered by an event like a post request from your tool, or a new object in an S3 bucket, or on a cron schedule and polling and API endpoint for you tool. Once the data is in the lambda function, the sole purpose of the function is to extract data and create an AlertEngine object with all the details about the alert parsed out, for example IP's, usernames, hostnames, etc. Once the alert object is created it gets queued in an S3 bucket and forwarded to your tool of choice.
But how could it get better?
Over the last few months I have been thinking how I could improve signaling and alerting in T.A.S.E.R. The idea I started to settle on I named
SignalServer. The idea of SignalSever was is take to be able to take a a bunch of smaller signals from AlertEngine signals, do automatic enrichment of artifacts based on rules set for each alert type by alert source. The next step would to follow a simple rules engine where you can define a set of rules to create alerts. For example:
[( 1 SSO login from a new location) + ( 2 failed Duo pushes) ] IN [ 4 Hours ] WITH [ username ] IN COMMON
[ (2 CarbonBlack alerts) ] IN [ 24 Hours ] WITH [ hostname ] IN COMMON
[ ( 4 IDS Alerts)] IN [ 36 Hours ] WITH [ src_ip, dst_port ] IN COMMON
After an alert is crated, if another signal comes into SignalSever within the timeframe of the rule, it would be added to the list of related signals for the alert as well as update the alert in the SIEM. This would help solve the problem of de-duping and flooding that you get from time to time. The rules configuration for both the data enrichment and the alert rules should be configurable from a web interface with dynamic options based off of the signals in SignalServer. It should also have a simulator to run new rules over the signals in SignalServer to let you know how often the alert would fire. And of course it would have searching and filtering capabilities as well as a well documented API.. Think like a very very simplified Splunk interface.
One of the things I didn't want to deal with starting off was writing my own SIEM. I also didn't want a SIEM that locked me into any type of ecosystem or that wasn't open enough to customize the way I wanted it to work. In my previous Jobs I have worked with TheHive Project. TheHive is an open source SIEM to put simply. It has a lot of advanced features that I was looking for like:
- Artifacts: IPs, usernames, etc that are related to an alert or case. These artifacts are used to show related cases from alert or other cases.
- Enrichment: Part of TheHive Project is Cortex. It is a data enrichment tool that allows you to enrich data like an domain.. For example you can run a domain report from Virus total, run a whois, DNS lookup, and block it in Cisco Umbrella all from TheHive case. I have also connected Cortex to Positron to use the unified data enrichment engine.
- Open API: TheHive has a very open API that allows you to integrate your own hooks into TheHive. For example I have created an integration that allows you to simply click a button and a new slack channel will be created for that case, and every message sent in slack will be logged as a case log in a log folder dedicated to that channel. If you click the button again it will create a second channel so you can invite different people as you need different conversations for that case.
- Scalable: I needed something that would be able to handle a large volume of alerts and cases without slowing down. Right now in one of my test environments I have over 125,000 alerts and 16,000 cases and counting by the day. I have seen very very low performance impact by those numbers, and ideally you should have a much lower volume of alerts as each alert should ideally be actionable ( SignalSever will help with this too )
- Event Webhooks: I needed a way to trigger automation or notifications from events in TheHive. This is kinda a work-around but it works well.. TheHive offers only one webhook, this webhook is called for every event in TheHive and is designed for audit logging. However it is very detailed and with a Lambda function can be used as an event trigger.
With all of that said there are a few downsides:
- No ACLs per case level. This means everyone can see every case. However this is something that they are currently working on.
- No teams. There is only one Queue.. To work around this I add a tag
Team:AppSecand you can filter your queue by your team tag. I also use this tag to know what team to send alert notifications to. However this is something that they are currently working on.
- No auto data enrichment. Enrichment has to be manually done or called by an API call.. Also you can't run data enrichment on alerts.. Only cases.. This means I cant pivot and then realize this is an related alert to a case I'm working and merge it.. Thats why I want SignalServer to do data enrichment. ( I also don't want to lock myself into TheHive because of this)
- Case logs are editable. To me case logs and notes should not be editable. You should always have a sold timeline to go off of when reviewing an incident.
- No easy way to export a full case. There needs to be a way to fully export the case with all the related alerts and artifacts into a simple CSV, JSON, or something.. Right now it would take multiple API calls to say
give me all the usernames from new country login alerts
GLaDOS is an open source Slack bot framework I created. When I first started working on T.A.S.E.R I created SecurityBot, it was a simple bot that would be triggered from TheHive alerts by a tag in the alert. The first SecurityBot automation was related to users logging into systems from a different country for the first time. It would take and alert, convert it into a case ( needs to be a case to add case logs ), then reach out to the user to confirm their activity and depending on the severity would also reach out to their manager for confirmation. At this point the Slack bot code and functions were living in AlertEngine because I already had continuous deployment setup and it already had the functionality I needed. This was a huge success and from there the idea of a Slack bot framework got started.
GLaDOS is a framework that is designed to run in AWS using API Gateway to give one common API endpoint and interface to all of the Slack bots that you wish to run. This means if someone wants to spin up their own Slack bot they no longer need to worry how to deploy the API Gateway or keep it running on a sever etc. Instead they just need to create their GLaDOS Plugin, make a pull request to the GLaDOS deployment repo and once approved should be auto-deployed. Each GLaDOS Plugin is given its own namespace and the routing of things like buttons and interactions is all done for you by the GLaDOS framework. This means you can focus on writing you Slack bot and not about how to keep it running and keep it secure.
Positron is a centralized query action framework. Think of it more as one API to rule them all. The idea here is to standardize queries, for example IP_TO_HOSTNAME. When that query is issued to Positron, it will be sent to all the plugins that support that query, for example you may have DHCP logs, OSquery, CarbonBlack, etc.. how do you know which one has the best data? .. you don’t.. So instead why not ask them all, rank and correlate the responses and standardize the response?
There a few problems that this solves: - Limit Accra to Data: Not everyone needs full access to logs. Instead you can set ACL’s per query of user groups that can issue that query. This way the analysts can have access to the queries without access to the logs directly.
- Constant Data: Because all data sources are queries in the same way for the same query, you won’t have the issue where one tool was written using data source A and another tool uses data source B.
- Maintainability and Evolution: Migrating from one tool to another. Let’s say I want to go from using Cisco to using Palo Alto Networks.. All that I have to do to move all of my tools from Cisco to PAN is enable the PAN plugin in positron. Because you use pre-defined queries like IP_TO_HOSTNAME, zero changes are required to your tools and clients that are using Positron.
- Secrets: The more internal clients and tools that use Positron the less number of tools and scripts that have access to secrets and potentially stored insecurely.
Basic automation is something I have been able to do using TheHive and AlertEngine up until this point. For example being able to escalate a Slack bot to sending a message to an employees manager. Automation is something I am finally starting to get around to being able to do. The most important thing for automation is to have a good foundation to go off of. For example even the basic automation case I gave I needed:
- An alert to trigger off of: No response from a user after 4 hours from the time the Slack bot sent the message.
- A way to actually trigger it: I used AlertEngine to look for cases owned by SecurityBot that were still open. In hive I added a custom field
SecurityBot_Timeoutthat was set to 240 minutes. I would look at all open cases and see if it has been more than 240 minutes since the message was sent. If it was then I would send a message to their manager.
- A way to find their manager: I make a call to Positron
user_managerand pass in the username with a tag of
target_useras the user to find the manager for.
- A way to send the slack message: For this I would use GLaDOS to send the message to the users manager.
- A way to record the response: When the manager clicks a button, I need to update the case. For this I embed the CaseID into the value of the button in slack.
- A way to escalate the case: If the manager says that this isn't right, I need a way to page the team. For this if the response requires us to escalate the case, I use a custom Cortex plugin to find the person on-call from the team from PagerDuty using Positron, assign the case to them, and then if its a critical case send a page to the employee on-call. If its not critical they will get a slack message saying the case has been assigned to them and it will be in their queue in the morning.
When you look at everything you need to do automation you can see that T.A.S.E.R gives you all the tools you need to be able to do advance security event automation. Now all that is left to do is create an simple automation platform.