What to do

Your task is the following:

- Review each script and identify any indicators of compromise or unusual behavior.
- Summarize what the script does, highlighting any noteworthy findings.
- Think about automation. How would you detect these suspicious attributes programmatically?
- Write a detection mechanism using Rust or TypeScript to automate these detections.
    This should include rules to flag suspicious activity.
    A way to evaluate those rules to ensure effectiveness and minimize false positives.

Homework assignment Detections_Security - Senior Software Engineer (2).pdf

File1

Indicators of compromise:

window.execScript(text);
window.eval(text);

This script extracts three strings from a script tag and sends it to "sspapi.zenyou.71360.com", then if operation is complete with 200 response, it runs the server response code.

File2

It has a virtual machine and a deobfuscation function _0xfd2f that maps encoded indices to string values from _0x4720. It collects data from fields like name, phone address. Encodes urlencoded data to Base64 And sends it to https://cdn-report.com/

Indicators of compromise:

Encoded_strings/data
while (!![])

In this file another indicator of compromise is a different dns from the website source this script was found. This should be mitigated with the right Content Security Policy.

File 3

Manipulates DOM by injecting mysite-frame. The iframe gets the original website referrer and loads Javascript files.

Indicators of compromise:

Create an iframe loading scripts

It also attaches event listeners for different page events ("turbo:visit", "turbolinks:visit", "page:before - change", "turbo:before - cache", "turbolinks:before - cache""turbo:load", "turbolinks:load", "page:change") to control the behavior of the injected script.

File 4

This script stores all keys pressed and sends them to a server beforeunload event. It also sends all data collected from forms when this data is submitted.

Indicators of compromise:

Event listeners on submit and on key pressed.

Indicators of compromise

In all these files the main indicator of compromise are that the information where they send scripts have a different dns from the original website. This should be mitigated with the right Content Security Policy.

Automation summary

To automate these suspicious attributes programmatically, the proposed solution uses two mechanisms.

File description using Qwen model.
File embeddings search using all-MiniLM-L6-v2 model.

The Qwen model generates a json file with the results of reviewing each file. It generates an output with the impact of the issues. Then we generate embeddings for our detected thread files and store them to the Qdrant vector database. This allows us to detect future threads similar to the ones we already found. Then we can test our system with new files.

File embeddings search will be positive if a new file has a similarity higher than 60% to a thread we already found. File descriptor will be positive if it detects a security concern of high impact.

File similarity positive and file descriptor positive, it's 100% a thread
File similarity negative and file descriptor positive, we have possibly found a new thread file.
File similarity positive and file descriptor negative, it has 50% prob. of being a thread.
File similarity negative and file descriptor negative, it's not a thread.

Code

Requirements

You need to have Rust with Cargo dependency management installed.
You need to have Docker installed in your system.
You need to have Nvidia GPU in your system, the more memory available, the bigger the model you can run.

Vector database:

To use the vector database with docker run:

docker compose up qdrant

Thread detector Code

Folders with files

thread_files: folder contains file threads we already found

potential_threads: folder contains files we want to test

potential_threads_descriptors: folder contains the results of applying Qwen2.5-Coder-32B-Instruct to potential_threads

Code description

You can run code_description with the following command:

cargo run --example code_description -- --top-p 0.9 --temperature 0.7 --repeat-penalty 1

It will generate a json of the potential thread of a file.

You can get the prompt from get_prompt in examples/code_description/main.rs and use it in demo Qwen Demo.

<<I don't have enough gpu memory to run "2.5-coder-32b-instruct" model, so I have gathered the output from Qwen Demo. I have seen that the output quality differs when using Pytorch or Candle-nn, so it might be a bug in matrix multiplication (tokens are the same).>>.

Parameters:

cpu: Run code on cpu, but not recommended for the amount of time

temperature: Afects the variability and randomness of generated responses. Values close to 0 are more deterministic.

top_p: It sets a threshold probability whose cumulative probability exceeds the threshold.

seed: Seed used for randomness

sample_len: Length of the generated sample

repeat_penalty: Penalty applied for repeating tokens

model: Name of the model to run, ex: 2.5-coder-32b-instruct

Store threads embeddings

You can store thread embeddings with the following command:

cargo run --example store_thread_embeddings

It will compute embeddings on thread files and store them in the database

Parameters:

cpu: Run code on cpu, but not recommended for the amount of time

model_id: model used, default: sentence-transformers/all-MiniLM-L6-v2

revision: model revision used, default: refs/pr/21

Detect new threads

You can run detect_new_threads with the following command:

cargo run --example detect_new_threads

It will create embeddings of files and search the most similar file. It will check as well if there high threads in file description.

Parameters:

cpu: Run code on cpu, but not recommended for the amount of time

model_id: model used, default: sentence-transformers/all-MiniLM-L6-v2

revision: model revision used, default: refs/pr/21

Final output:

File: ./potential_threads/file6.js - is 100% a thread

File: ./potential_threads/file5.js - is 100% a thread

File: ./potential_threads/file100.js is not a thread

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
thread_detector		thread_detector
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

What to do

File1

File2

File 3

File 4

Indicators of compromise

Automation summary

Code

Requirements

Vector database:

Thread detector Code

Folders with files

Code description

Store threads embeddings

Detect new threads

Final output:

About

Uh oh!

Releases

Packages

Uh oh!

Languages

kujaomega/thread_detector

Folders and files

Latest commit

History

Repository files navigation

What to do

File1

File2

File 3

File 4

Indicators of compromise

Automation summary

Code

Requirements

Vector database:

Thread detector Code

Folders with files

Code description

Store threads embeddings

Detect new threads

Final output:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages