Skip to content

WIP: Baseline which can run big programs#1789

Open
bjjwwang wants to merge 33 commits intoSVF-tools:masterfrom
bjjwwang:worklist
Open

WIP: Baseline which can run big programs#1789
bjjwwang wants to merge 33 commits intoSVF-tools:masterfrom
bjjwwang:worklist

Conversation

@bjjwwang
Copy link
Contributor

@bjjwwang bjjwwang commented Feb 7, 2026

No caller functions as entries

AE traverses all callgraph node and pick the node that has no in edges, which means they are no caller functions. No caller functions can serve as entries.

Current Result (every functions as entry)

LIMIT: 7200 seconds or 64GB Memory

Test Case Size Status Time Peak Memory ICFG Coverage Func Coverage
bzip2 444K ❌ ERROR 1445.2s 22123MB - -
cjson 448K ✅ SUCCESS 1331.3s 1280MB 92.52% 98.03%
libpng 1.6M 💀 OOM/KILLED 7200.0s - - -
lua 1.8M ❌ ERROR 1.2s 66MB - -
lz4 2.6M 💀 OOM/KILLED 7200.0s - - -
mbedtls 5.0M ❌ ERROR 343.3s 65380MB - -
memcached 60K ✅ SUCCESS 10.9s 349MB 88.87% 100.00%
redis 12M ❌ ERROR 602.0s 65358MB - -
sqlite 7.0M 💀 OOM/KILLED 7200.0s - - -
xz 8.0K ✅ SUCCESS 0.2s 68MB 98.86% 100.00%

Result (No Caller function as entry)

Test Case Size Status Time Peak Memory ICFG Coverage Func Coverage
bzip2 444K ❌ ERROR 1285.2s 25644MB - -
cjson 448K ✅ SUCCESS 980.9s 2810MB 56.27% 86.92%
curl 2.9M ❌ ERROR 3999.9s 65377MB - -
expat 1.1M  OOM/KILLED 7200.0s - - -
jq 1.6M ❌ ERROR 2195.6s 65378MB - -
libjpeg 3.2M ❌ ERROR 3870.8s 65377MB - -
libpng 1.6M ❌ ERROR 2042.5s 65380MB - -
lua 1.8M ❌ ERROR 0.2s 65MB - -
lz4 2.6M  OOM/KILLED 7200.0s - - -
mbedtls 5.0M ❌ ERROR 537.4s 65379MB - -
memcached 60K ✅ SUCCESS 10.3s 1252MB 88.87% 100.00%
redis 12M ❌ ERROR 641.9s 65356MB - -
sqlite 7.0M ❌ ERROR 4317.0s 65224MB - -
xz 8.0K ✅ SUCCESS 0.2s 67MB 98.86% 100.00%
zlib 560K  OOM/KILLED 7200.0s - - -
zstd 8.7M ❌ ERROR 383.1s 65345MB - -

bjjwwang and others added 22 commits January 26, 2026 22:18
I think you could continue to narrow down the usage of OPtions::HandleREcur(). I mean maybe you could put the option check inside the function.

For example,

"// Check if this recursive call should be skipped

        if (shouldSkipRecursiveCall(callNode, funObjVar))

        {

            // In TOP mode, set return value and stores to TOP

            // In WIDEN\_ONLY/WIDEN\_NARROW, just skip (WTO handles it)

            if (Options::HandleRecur() == TOP)

                handleSkippedRecursiveCall(callNode);

            return;

        }

"

maybe  you should move the if check inside the function. You could try this way to reduce the Options::handleRecur as low as possible.
I think you could continue to narrow down the usage of OPtions::HandleREcur(). I mean maybe you could put the option check inside the function.

For example,

"// Check if this recursive call should be skipped

        if (shouldSkipRecursiveCall(callNode, funObjVar))

        {

            // In TOP mode, set return value and stores to TOP

            // In WIDEN\_ONLY/WIDEN\_NARROW, just skip (WTO handles it)

            if (Options::HandleRecur() == TOP)

                handleSkippedRecursiveCall(callNode);

            return;

        }

"

maybe  you should move the if check inside the function. You could try this way to reduce the Options::handleRecur as low as possible.
I think you could continue to narrow down the usage of OPtions::HandleREcur(). I mean maybe you could put the option check inside the function.

For example,

"// Check if this recursive call should be skipped

        if (shouldSkipRecursiveCall(callNode, funObjVar))

        {

            // In TOP mode, set return value and stores to TOP

            // In WIDEN\_ONLY/WIDEN\_NARROW, just skip (WTO handles it)

            if (Options::HandleRecur() == TOP)

                handleSkippedRecursiveCall(callNode);

            return;

        }

"

maybe  you should move the if check inside the function. You could try this way to reduce the Options::handleRecur as low as possible.
I think you could continue to narrow down the usage of OPtions::HandleREcur(). I mean maybe you could put the option check inside the function.

For example,

"// Check if this recursive call should be skipped

        if (shouldSkipRecursiveCall(callNode, funObjVar))

        {

            // In TOP mode, set return value and stores to TOP

            // In WIDEN\_ONLY/WIDEN\_NARROW, just skip (WTO handles it)

            if (Options::HandleRecur() == TOP)

                handleSkippedRecursiveCall(callNode);

            return;

        }

"

maybe  you should move the if check inside the function. You could try this way to reduce the Options::handleRecur as low as possible.
I think you could continue to narrow down the usage of OPtions::HandleREcur(). I mean maybe you could put the option check inside the function.

For example,

"// Check if this recursive call should be skipped

        if (shouldSkipRecursiveCall(callNode, funObjVar))

        {

            // In TOP mode, set return value and stores to TOP

            // In WIDEN\_ONLY/WIDEN\_NARROW, just skip (WTO handles it)

            if (Options::HandleRecur() == TOP)

                handleSkippedRecursiveCall(callNode);

            return;

        }

"

maybe  you should move the if check inside the function. You could try this way to reduce the Options::handleRecur as low as possible.
I think you could continue to narrow down the usage of OPtions::HandleREcur(). I mean maybe you could put the option check inside the function.

For example,

"// Check if this recursive call should be skipped

        if (shouldSkipRecursiveCall(callNode, funObjVar))

        {

            // In TOP mode, set return value and stores to TOP

            // In WIDEN\_ONLY/WIDEN\_NARROW, just skip (WTO handles it)

            if (Options::HandleRecur() == TOP)

                handleSkippedRecursiveCall(callNode);

            return;

        }

"

maybe  you should move the if check inside the function. You could try this way to reduce the Options::handleRecur as low as possible.
I think you could continue to narrow down the usage of OPtions::HandleREcur(). I mean maybe you could put the option check inside the function.

For example,

"// Check if this recursive call should be skipped

        if (shouldSkipRecursiveCall(callNode, funObjVar))

        {

            // In TOP mode, set return value and stores to TOP

            // In WIDEN\_ONLY/WIDEN\_NARROW, just skip (WTO handles it)

            if (Options::HandleRecur() == TOP)

                handleSkippedRecursiveCall(callNode);

            return;

        }

"

maybe  you should move the if check inside the function. You could try this way to reduce the Options::handleRecur as low as possible.
I think you could continue to narrow down the usage of OPtions::HandleREcur(). I mean maybe you could put the option check inside the function.

For example,

"// Check if this recursive call should be skipped

        if (shouldSkipRecursiveCall(callNode, funObjVar))

        {

            // In TOP mode, set return value and stores to TOP

            // In WIDEN\_ONLY/WIDEN\_NARROW, just skip (WTO handles it)

            if (Options::HandleRecur() == TOP)

                handleSkippedRecursiveCall(callNode);

            return;

        }

"

maybe  you should move the if check inside the function. You could try this way to reduce the Options::handleRecur as low as possible.
I think you could continue to narrow down the usage of OPtions::HandleREcur(). I mean maybe you could put the option check inside the function.

For example,

"// Check if this recursive call should be skipped

        if (shouldSkipRecursiveCall(callNode, funObjVar))

        {

            // In TOP mode, set return value and stores to TOP

            // In WIDEN\_ONLY/WIDEN\_NARROW, just skip (WTO handles it)

            if (Options::HandleRecur() == TOP)

                handleSkippedRecursiveCall(callNode);

            return;

        }

"

maybe  you should move the if check inside the function. You could try this way to reduce the Options::handleRecur as low as possible.
I think you could continue to narrow down the usage of OPtions::HandleREcur(). I mean maybe you could put the option check inside the function.

For example,

"// Check if this recursive call should be skipped

        if (shouldSkipRecursiveCall(callNode, funObjVar))

        {

            // In TOP mode, set return value and stores to TOP

            // In WIDEN\_ONLY/WIDEN\_NARROW, just skip (WTO handles it)

            if (Options::HandleRecur() == TOP)

                handleSkippedRecursiveCall(callNode);

            return;

        }

"

maybe  you should move the if check inside the function. You could try this way to reduce the Options::handleRecur as low as possible.
I think you could continue to narrow down the usage of OPtions::HandleREcur(). I mean maybe you could put the option check inside the function.

For example,

"// Check if this recursive call should be skipped

        if (shouldSkipRecursiveCall(callNode, funObjVar))

        {

            // In TOP mode, set return value and stores to TOP

            // In WIDEN\_ONLY/WIDEN\_NARROW, just skip (WTO handles it)

            if (Options::HandleRecur() == TOP)

                handleSkippedRecursiveCall(callNode);

            return;

        }

"

maybe  you should move the if check inside the function. You could try this way to reduce the Options::handleRecur as low as possible.
你方便把 setRecursiveCallStoresToTop改名setTopToObjInRecursion

然后把callFunPass 改名HandleFunCall吗? 改名就行
你方便把 setRecursiveCallStoresToTop改名setTopToObjInRecursion

然后把callFunPass 改名HandleFunCall吗? 改名就行
你方便把 setRecursiveCallStoresToTop改名setTopToObjInRecursion

然后把callFunPass 改名HandleFunCall吗? 改名就行
- Add collectEntryFunctions() to find functions without callers
- Add analyseFromAllEntries() for analyzing from all entry points
- Implement flow-sensitive join for same function called multiple times
- Add ICFG and function coverage statistics in AEStat
- Add allAnalyzedNodes set to track analyzed nodes across entry points
- Fix bottom interval assertion errors in AbsExtAPI (handleMemcpy/handleMemset)

When no main function exists, the analysis automatically starts from all
entry points (functions with no callers). Each entry point is analyzed
independently with fresh state. Coverage statistics now correctly track
all analyzed nodes across multiple entry points.

Co-Authored-By: Claude <noreply@anthropic.com>
- Add new command-line option -ae-multientry (default: false)
- When false (default): analyze from main() only, preserving original behavior
  for Test-Suite test cases
- When true: analyze from all entry points (functions without callers),
  useful for library code without main function
- If no main function exists and -ae-multientry is not set, automatically
  falls back to multi-entry analysis

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Revert the flow-sensitive join logic in handleICFGNode that was incorrectly
breaking the original function entry state initialization. The previous change
introduced unnecessary complexity that caused incorrect state propagation
when functions are called.

The multi-entry analysis feature still works correctly because each entry
point is analyzed independently with clearAbstractTrace() called before
each entry, so flow-sensitive join at function entries is not needed.

This fixes 7 out of 8 failing test cases:
- BASIC_ptr_call2-0.c.bc
- LOOP_for_call-0.c.bc
- CWE121_Stack_Based_Buffer_Overflow__CWE129_fgets_01.c.bc
- CWE121_Stack_Based_Buffer_Overflow__CWE129_fgets_01.c.bc-widen-narrow
- CWE126_Buffer_Overread__CWE129_fgets_01.c.bc
- CWE126_Buffer_Overread__CWE129_fgets_01.c.bc-widen-narrow
- demo.c.bc-widen-narrow

The remaining failure (INTERVAL_test_10-0.c.bc) is a pre-existing bug
unrelated to the multi-entry changes.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1. Fix BufOverflowDetector assertion (AEDetector.cpp:482)
   - When a variable is not an address type in multi-entry analysis,
     conservatively return true (assume safe) instead of asserting

2. Fix undefined compare predicate assertion (AbstractInterpretation.cpp)
   - Add support for FCMP_ORD and FCMP_UNO floating-point comparisons
   - These predicates check for NaN conditions, conservatively return [0,1]

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@codecov
Copy link

codecov bot commented Feb 7, 2026

Codecov Report

❌ Patch coverage is 83.58209% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.19%. Comparing base (6ee4a20) to head (aa86754).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
svf/lib/AE/Svfexe/AbsExtAPI.cpp 81.94% 13 Missing ⚠️
svf/lib/AE/Svfexe/AbstractInterpretation.cpp 86.66% 8 Missing ⚠️
svf/lib/AE/Svfexe/AEDetector.cpp 50.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1789      +/-   ##
==========================================
+ Coverage   64.14%   64.19%   +0.04%     
==========================================
  Files         243      243              
  Lines       24626    24668      +42     
  Branches     4658     4663       +5     
==========================================
+ Hits        15797    15836      +39     
- Misses       8829     8832       +3     
Files with missing lines Coverage Δ
svf/include/AE/Svfexe/AbstractInterpretation.h 95.23% <ø> (ø)
svf/lib/AE/Svfexe/AEDetector.cpp 85.29% <50.00%> (-0.36%) ⬇️
svf/lib/AE/Svfexe/AbstractInterpretation.cpp 79.86% <86.66%> (+0.76%) ⬆️
svf/lib/AE/Svfexe/AbsExtAPI.cpp 90.84% <81.94%> (+0.81%) ⬆️

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

void analyse();

/// Analyze all entry points (functions without callers)
void analyseFromAllEntries();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

analyzeFromAllProgEntries

void analyseFromAllEntries();

/// Get all entry point functions (functions without callers)
std::vector<const FunObjVar*> collectEntryFunctions();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collectProgEntryFuns

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, the name is collectProgEntryFuns now


/// Program entry
/// Collect all entry point functions (functions without callers)
std::vector<const FunObjVar*> AbstractInterpretation::collectEntryFunctions()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to use a deque if you can both push from back and push to front (for main). Then no std::find_if is needed later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after last meeting, keep this version

icfg->getGlobalICFGNode())[PAG::getPAG()->getBlkPtr()] = IntervalValue::top();

// If -ae-multientry is set, always use multi-entry analysis
if (Options::AEMultiEntry())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

This should be by default.

done, remove it

Comment on lines 258 to 259
getAbsStateFromTrace(
icfg->getGlobalICFGNode())[PAG::getPAG()->getBlkPtr()] = IntervalValue::top();
Copy link
Collaborator

@yuleisui yuleisui Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be moved to handleGlobalNode? Please also make a note to explain why this assignment is done here.

NodeID value_id = value->getId();

assert(as[value_id].isAddr());
// In multi-entry analysis, some variables may not be initialized as addresses
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add an example here?

Shall we also intitalize them?


/// Program entry
/// Parse comma-separated function names from the -ae-entry-funcs option
static Set<std::string> parseEntryFuncNames()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this to Options.cpp

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after meeting, I remove it.

const SVFVar* arg1Val = call->getArgument(1);
IntervalValue strLen = getStrlen(as, arg1Val);
// no need to -1, since it has \0 as the last byte
// Skip if strLen is bottom or unbounded
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we share as much code as possible for string handling functions you have

// Use worklist-based function handling instead of recursive WTO component handling
const ICFGNode* mainEntry = icfg->getFunEntryICFGNode(cgn->getFunction());
handleFunction(mainEntry);
SVFUtil::errs() << "Warning: No entry functions found for analysis\n";
Copy link
Collaborator

@yuleisui yuleisui Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should always have at least one entry function (no caller function). May be an assert is better.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make an assert here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make an assert here?

Sure Sure. I am double checking all comments.

}

// Analyze from each entry point independently (Scenario 2: different entries -> fresh start)
for (const FunObjVar* entryFun : entryFunctions)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think handle global should be done before entry function?

Also it would be good to add each entry icfgnode of entry function into the worklist for later abstract interpretation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, now the handleGlobalNode has been moved to the location before for-loop.

// Analyze from each entry point independently (Scenario 2: different entries -> fresh start)
for (const FunObjVar* entryFun : entryFunctions)
{
// Clear abstract trace for fresh analysis from this entry
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handle global node can be done outside this for loop?

It is unclear why we need to clear abstract trace here? The abstract states that for an ICFGNode A should be merged if A has two callers (if both callers are entry functions)?

std::deque<const FunObjVar*> AbstractInterpretation::collectProgEntryFuns()
{
std::deque<const FunObjVar*> entryFunctions;
const CallGraph* callGraph = svfir->getCallGraph();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to use andersen's call graph if we have.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. now we use andersen's call graph at handle call site (especially for indirect call).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you update the code?


// Analyze from this entry function
const ICFGNode* funEntry = icfg->getFunEntryICFGNode(entryFun);
handleFunction(funEntry);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to double-check handleCallsite whether andersen's call graph shall be used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. andersen's call graph is in effect now.

if (!as[value_id].isAddr())
{
NodeID blkPtrId = svfir->getBlkPtr();
as[value_id] = AddressValue(AbstractState::getVirtualMemAddress(blkPtrId));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use black hole object

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I haven't finished this commit, and I still check these comments and wait for the large programs' result.

@@ -98,7 +100,7 @@ void AbstractInterpretation::initWTO()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall the Andersen analysis and callgraph initialisation be done in the constructor of AbstractInterpretation so that callgraph will not be assigned a nullptr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall the Andersen analysis and callgraph initialisation be done in the constructor of AbstractInterpretation so that callgraph will not be assigned a nullptr

yes, that makes more sense.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move andersen's analysis and callgraph to AbstractInterpretation's constructor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move andersen's analysis and callgraph to AbstractInterpretation's constructor?

ok it has been in AbstractInterpretation's constructor. And I also add CallGraphSCC in class AbstractInterpretation in order to remove all callgraph construction stuff in initWTO().

@yuleisui
Copy link
Collaborator

The CI failed.

CallGraphSCC* callGraphScc;
AEStat* stat;

std::vector<const CallICFGNode*> callSiteStack;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need this callSiteStack? Where this has been used apart from push/pop?

// while being conservatively sound.
if (!as[value_id].isAddr())
{
as[value_id] = AddressValue(InvalidMemAddr);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to rename InvalidMemAddr to be BlackHoleObjAddr.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants