P4 is an common library for automated vulnerability analysis and patching with custom model powered agent. It combines LLMs with reinforcement learning to analyze crash logs, identify vulnerable code patterns, and generate targeted patches.
- Modular Architecture: Composable components for pattern detection, policy execution, and environment management
- Multi-Language Support: C++ and Java analysis with extensible pattern matching
- LLM Integration: Policy-driven decision making using language models
- LoRA Adaptation: Dynamic model fine-tuning based on vulnerability context
- Type-Safe Design: Protocol-based interfaces with generic type parameters
BasePattern: Protocol for pattern matchingFragment: Code segments with position metadata- Language-specific patterns:
CppCallPattern,JavaInvocationPattern, etc.
BaseChatPolicy: LLM-driven decision makingBaseEraserPolicy: Vulnerability symbol extraction- Three-phase execution: Observation → Prompt → Completion → Action
BaseEnvironment: Step-based execution contextBaseTrainableEnvironmentWithChat: RL support with reward functions
BaseRunnable: Generic execution protocol with error handling
Scope: Execution context managementBaseSandbox: Isolated execution environments
BaseDocument: Abstract base for all document typesTextDocument: Uneditable text content (crash logs)FileDocument: Source code files with path metadata- Annotation System: Pattern-based highlighting
CppFunctionDefinitionTool: AST-based C++ function extractionJavaMethodDeclarationTool: Java method identification- Symbol Resolution: Integration with
globalcommand-line tool
- Multi-tool parallel execution using
joblib - Episode-based execution with configurable limits
- Automatic crash log parsing and stack trace extraction
- LangChain Integration: Automated patch generation
- Virtual File System: In-memory file modifications
- Tool-based Editing: Structured search-and-replace operations
- LoRA Client: Dynamic adapter loading/unloading
- Context-Aware Training: Automatic adapter generation from vulnerability context
class BaseRunnable[T, U, Context](Protocol):
def run(self, x: T, context: Context) -> U: ...Uses ast-grep for AST-based analysis:
def match(self, source: str) -> set[Fragment]:
root = SgRoot(source, "cpp").root()
return {Fragment(value=node.text(), start_position=node.range().start.index)
for node in root.find_all(kind="call_expression")}documents = Parallel(n_jobs=-1, backend="threading")(
delayed(tool.run_or_none)(symbol, scope)
for symbol, tool in product(action, self._tools)
)pydantic(≥2.8.2): Type-safe data modelinglangchain-core: LLM integration and tool orchestrationlanggraph(≥0.4.7): Graph-based agent executionopenai(≥1.66.3): OpenAI API integrationast-grep-py(≥0.38.3): AST-based code analysisripgrepy(≥2.1.0): High-performance text searchjoblib(≥1.5.1): Parallel computingrequests(≥2.32.3): HTTP client
from p4 import AIxCCEnvironment, CppCallPattern, CppFunctionDefinitionTool
# Configure environment
patterns = [CppCallPattern(limit=10)]
tools = [CppFunctionDefinitionTool()]
environment = AIxCCEnvironment(tools=tools, episode_length=5, scope_builder=scope_builder)
# Execute analysis
observation = environment.reset(context)
action = policy.act(observation, previous_observation)
observation, terminated, truncated = environment.step(action, observation, context)class CustomPattern(BasePattern):
def match(self, source: str) -> set[Fragment]:
root = SgRoot(source, "cpp").root()
return {Fragment(value=node.text(), start_position=node.range().start.index)
for node in root.find_all(pattern="<tree-sitter-kind>")}