A dynamic JAR-to-JAR bytecode identifier mapper using the ASM library from OW2. This tool analyzes two versions of a JAR file and maps obfuscated identifiers (classes, methods, fields) between them using structural similarity analysis.
This mapper is designed to match identifiers between two versions of a JAR where names have been obfuscated to patterns like:
class[0-9]*- Obfuscated class namesmethod[0-9]*- Obfuscated method namesfield[0-9]*- Obfuscated field names
The mapper uses Method 2: Iterative Similarity Scoring with the following approach:
- Match all non-obfuscated identifiers by name
- These provide anchors for propagating matches to obfuscated code
The algorithm iteratively matches identifiers based on structural similarity:
-
Class Matching
- Super class matches (if already matched)
- Interface matches (if already matched)
- Method signature similarity (descriptor patterns)
- Field type similarity
- Member count ratios
-
Method Matching (requires matched owner class)
- Descriptor exact match or partial match
- Argument type matches (if types already matched)
- Return type matches (if type already matched)
- Constant pool similarity
- Instruction sequence patterns
- Static/instance modifier match
-
Field Matching (requires matched owner class)
- Field type matches (if type already matched)
- Initial value comparison
- Static/instance modifier match
- Obfuscated identifiers can only match to other obfuscated identifiers
- Non-obfuscated identifiers can only match to other non-obfuscated identifiers
- All obfuscated classes must be in the default/base package
- Matches require:
- Similarity score ≥ 0.85
- Confidence gap (1st - 2nd place) ≥ 0.25
When a match is confirmed, weak scores propagate to related identifiers:
- Class match → super class, interfaces
- Method match → argument types, return type
- Field match → field type
This creates a cascading effect where initial anchors propagate through the codebase.
✅ Structural Analysis
- Analyzes class hierarchies, interfaces, members
- Compares method signatures and bytecode patterns
- Examines constant pools and instruction sequences
✅ Conservative Matching
- High confidence thresholds to avoid false positives
- Would rather skip a match than match incorrectly
- Bipartite matching ensures 1-to-1 relationships
✅ Iterative Refinement
- Multiple passes to propagate matches
- Each iteration confirms high-confidence matches
- Continues until no new matches found
./gradlew build./gradlew run --args="<jar-a> <jar-b> [output-file]"Or after building:
java -jar build/libs/bytecode-mapper-1.0.0.jar <jar-a> <jar-b> [output-file]jar-a- Path to the first JAR file (source)jar-b- Path to the second JAR file (target)output-file- Optional output file for mappings (default:mappings.txt)
./gradlew run --args="old-version.jar new-version.jar mappings.txt"The tool generates a text file with three sections:
## Class Mappings
class123 -> class456
class124 -> class457
## Method Mappings
class123.method1()V -> class456.method1()V
class123.method2(I)I -> class456.method5(I)I
## Field Mappings
class123.field1:I -> class456.field2:I
class123.field2:Ljava/lang/String; -> class456.field3:Ljava/lang/String;
-
Model (
com.mapper.model)ClassInfo- Represents a class with metadataMethodInfo- Represents a method with signatureFieldInfo- Represents a fieldJarEnvironment- Container for all analyzed identifiers
-
Analyzer (
com.mapper.analyzer)JarAnalyzer- Parses JARs using ASM- Extracts classes, methods, fields, constants
- Builds instruction patterns
-
Matcher (
com.mapper.matcher)IdentifierMatcher- Core matching algorithm- Implements iterative similarity scoring
- Manages confirmed matches and propagation
Method 1 Challenges:
- Extremely complex to implement correctly
- Requires simulating execution which may differ between versions
- Entry-point dependent - misses unreachable code
- Hard to handle reflection, dynamic dispatch, native methods
- Doesn't scale well with large JARs
Method 2 Advantages:
- Proven approach (similar to existing deobfuscation tools)
- More robust to edge cases
- Easier to debug and tune
- Scales well with iterative refinement
- Handles partial information gracefully
The system uses multiple signals for scoring:
-
Strong signals (high weight):
- Exact descriptor matches
- Matched owner classes
- Identical constant pools
-
Medium signals:
- Partially matched type signatures
- Similar instruction patterns
- Matching access modifiers
-
Weak signals (propagation):
- Related types (superclass, interfaces)
- Referenced types in signatures
- Field types
The mapper uses conservative thresholds to avoid false matches:
- Minimum similarity: 0.85 (85%)
- Confidence gap: 0.25 (25% difference between 1st and 2nd place)
This ensures that only high-confidence, unambiguous matches are confirmed.
Potential improvements:
-
CFG Structure Comparison
- Compare control flow graphs for method matching
- Match based on branching patterns
-
Call Graph Analysis
- Build method call graphs
- Match based on caller/callee relationships
-
String Constant Analysis
- Weight string constants higher (often unique)
- Match methods using same error messages
-
Machine Learning
- Train models on known mappings
- Improve scoring weights automatically
-
Incremental Mapping
- Support for mapping across multiple versions
- Build confidence over time
- Kotlin 1.9.22
- ASM 9.6 (org.ow2.asm)
- JVM 11+
This project is provided as-is for educational purposes.