Skip to content

Commit

Permalink
Denorm return by ref (chapel-lang#24519)
Browse files Browse the repository at this point in the history
This PR adds an option `--no-return-by-ref` that undoes the return-by-ref
representation for most functions where it is used, see below.
The exception is made for homogenous tuples because of a current restriction
in codegen. Another exception is made for virtual functions as handling those
would requite additional effort.

This PR also collapses prim-moves from:

    def temp1
    move temp1, expr
    move temp2, temp1

into

    move temp2, expr

As a result, given a Chapel function `f()` that returns a record `R`,
the generated code is simplified from:

    void f(R* _retArg) {
      ....
      *_retArg = something;
      return;
    }

    f(&ret_tmp);
    call_tmp = ret_tmp;

into

    R f() {
      ....
      return something;
    }

    call_tmp = f();

The above changes are done as part of the `denormalize` pass.
The undo-return-by-ref transformation is off by default because
it can decrease performance. Move-collapsing is always on
because it should be harmless.

Other changes:

Add a test of this flag by running it on the main version of SSCA2.

Extend the suggestion for simplifying the generated code for debugging
to include also `--no-return-by-ref --no-inline`.

Reset the performance playground.

While there, add a previously-missing `addFlag(FLAG_RVV)` for consistency.

While there, also enable the `--no-` form of the `--iterator-contexts` option
and move its definition so it is more alphabetically-ordered in driver.cpp.

Potential next steps:

* Look into the 7% slowdown caused by --no-return-by-ref
  in "Submitted Fannkuch-Redux Shootout Benchmark (n=12)".

* Move the return-by-ref transformation in the compiler down to codegen,
  to simplify some of the earlier passes and debugging the IR.

* Change the micro-pass `collapseTrivialMoves()` from whole-program
  to per-function. Implement it using an AstVisitor. Have this visitor
  also eliminate vacuous gotos, which I was still observing.
  Move this collapsing to the start of per-function processing,
  ex. next to/after removeUnnecessaryGotos() and undoReturnByRef().
  This earlier location might obviate the need to vetting of the prim_moves,
  see canCollapseMoveBetween() and okSymbol().
  Still, continue to keep prim_moves into config, export, and extern vars.
  This collapsing could also be run after inlining and lowerIterators()
  as those generate additional pass-through moves.

Testing: as of this PR, the entire paratest with `-compopts --no-return-by-ref`
passes in {target=gnu,llvm}*{comm=none,gasnet} configurations.

r: @dlongnecke-cray
  • Loading branch information
vasslitvinov authored Mar 5, 2024
2 parents ed27104 + 1723d62 commit b8c0d12
Show file tree
Hide file tree
Showing 20 changed files with 294 additions and 12 deletions.
3 changes: 2 additions & 1 deletion compiler/include/driver.h
Original file line number Diff line number Diff line change
Expand Up @@ -193,6 +193,8 @@ extern bool fNoEarlyDeinit;
extern bool fNoCopyElision;
extern bool fCompileTimeNilChecking;
extern bool fInferImplementsStmts;
extern bool fIteratorContexts;
extern bool fReturnByRef;
extern bool fOverrideChecking;
extern int ffloatOpt;
extern int fMaxCIdentLen;
Expand Down Expand Up @@ -253,7 +255,6 @@ extern bool fReportOptimizedOn;
extern bool fReportPromotion;
extern bool fReportScalarReplace;
extern bool fReportGpu;
extern bool fIteratorContexts;
extern bool fReportContextAdj;
extern bool fReportDeadBlocks;
extern bool fReportDeadModules;
Expand Down
6 changes: 4 additions & 2 deletions compiler/main/driver.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -263,6 +263,8 @@ bool fNoEarlyDeinit = false;
bool fNoCopyElision = false;
bool fCompileTimeNilChecking = true;
bool fInferImplementsStmts = false;
bool fIteratorContexts = false;
bool fReturnByRef = true;
bool fOverrideChecking = true;
bool fieeefloat = false;
int ffloatOpt = 0; // 0 -> backend default; -1 -> strict; 1 -> opt
Expand Down Expand Up @@ -295,7 +297,6 @@ bool fReportOptimizeForallUnordered = false;
bool fReportPromotion = false;
bool fReportScalarReplace = false;
bool fReportGpu = false;
bool fIteratorContexts = false;
bool fReportContextAdj = false;
bool fReportDeadBlocks = false;
bool fReportDeadModules = false;
Expand Down Expand Up @@ -1411,7 +1412,6 @@ static ArgumentDescription arg_desc[] = {
{"report-promotion", ' ', NULL, "Print information about scalar promotion", "F", &fReportPromotion, NULL, NULL},
{"report-scalar-replace", ' ', NULL, "Print scalar replacement stats", "F", &fReportScalarReplace, NULL, NULL},
{"report-gpu", ' ', NULL, "Print information about what loops are and are not GPU eligible", "F", &fReportGpu, NULL, NULL},
{"iterator-contexts", ' ', NULL, "Handle iterator contexts", "F", &fIteratorContexts, NULL, NULL},
{"report-context-adjustments", ' ', NULL, "Print debugging information while handling iterator contexts", "F", &fReportContextAdj, NULL, NULL},

{"", ' ', NULL, "Developer Flags -- Miscellaneous", NULL, NULL, NULL, NULL},
Expand Down Expand Up @@ -1441,6 +1441,7 @@ static ArgumentDescription arg_desc[] = {
{"compile-time-nil-checking", ' ', NULL, "Enable [disable] compile-time nil checking", "N", &fCompileTimeNilChecking, "CHPL_COMPILE_TIME_NIL_CHECKS", NULL},
{"infer-implements-decls", ' ', NULL, "Enable [disable] inference of implements-declarations", "N", &fInferImplementsStmts, "CHPL_INFER_IMPLEMENTS_DECLS", NULL},
{"interleave-memory", ' ', NULL, "Enable [disable] memory interleaving", "N", &fEnableMemInterleaving, "CHPL_INTERLEAVE_MEMORY", NULL},
{"iterator-contexts", ' ', NULL, "Handle iterator contexts", "N", &fIteratorContexts, NULL, NULL},
{"ignore-errors", ' ', NULL, "[Don't] attempt to ignore errors", "N", &ignore_errors, "CHPL_IGNORE_ERRORS", NULL},
{"ignore-user-errors", ' ', NULL, "[Don't] attempt to ignore user errors", "N", &ignore_user_errors, "CHPL_IGNORE_USER_ERRORS", NULL},
{"ignore-errors-for-pass", ' ', NULL, "[Don't] attempt to ignore errors until the end of the pass in which they occur", "N", &ignore_errors_for_pass, "CHPL_IGNORE_ERRORS_FOR_PASS", NULL},
Expand Down Expand Up @@ -1477,6 +1478,7 @@ static ArgumentDescription arg_desc[] = {
{"remove-empty-records", ' ', NULL, "Enable [disable] empty record removal", "n", &fNoRemoveEmptyRecords, "CHPL_DISABLE_REMOVE_EMPTY_RECORDS", NULL},
{"remove-unreachable-blocks", ' ', NULL, "[Don't] remove unreachable blocks after resolution", "N", &fRemoveUnreachableBlocks, "CHPL_REMOVE_UNREACHABLE_BLOCKS", NULL},
{"replace-array-accesses-with-ref-temps", ' ', NULL, "Enable [disable] replacing array accesses with reference temps (experimental)", "N", &fReplaceArrayAccessesWithRefTemps, NULL, NULL },
{"return-by-ref", ' ', NULL, "Enable return-by-ref of structs in the generated code", "N", &fReturnByRef, NULL, NULL},
{"incremental", ' ', NULL, "Enable [disable] using incremental compilation", "N", &fIncrementalCompilation, "CHPL_INCREMENTAL_COMP", NULL},
{"minimal-modules", ' ', NULL, "Enable [disable] using minimal modules", "N", &fMinimalModules, "CHPL_MINIMAL_MODULES", NULL},
{"parallel-make", 'j', NULL, "Specify degree of parallelism for C back-end", "I", &fParMake, "CHPL_PAR_MAKE", &turnIncrementalOn},
Expand Down
259 changes: 259 additions & 0 deletions compiler/passes/denormalize.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,9 @@

#include "global-ast-vecs.h"

static void undoReturnByRef(FnSymbol* fn);
static void collapseTrivialMoves();

//helper datastructures/types
typedef std::pair<Expr*, Type*> DefCastPair;
typedef std::map<SymExpr*, DefCastPair> UseDefCastMap;
Expand Down Expand Up @@ -89,6 +92,8 @@ void denormalize(void) {
forv_Vec(FnSymbol, fn, gFnSymbols) {
// remove unused epilogue labels
removeUnnecessaryGotos(fn, true);
if (!fReturnByRef && fn->hasFlag(FLAG_FN_RETARG))
undoReturnByRef(fn);

bool isFirstRound = true;
do {
Expand All @@ -112,6 +117,8 @@ void denormalize(void) {
isFirstRound = false;
} while(deferredSyms.size() > 0);
}

collapseTrivialMoves();
}
}

Expand Down Expand Up @@ -669,3 +676,255 @@ inline bool unsafeExprInBetween(Expr* e1, Expr* e2, Expr* exprToMove,
}
return false;
}

/////////////////////////////////////////////////////////////////////////////
// undoReturnByRef()
//
// Ultimately we may want to avoid introducing return by ref.
// For now, simply undo it as a denormalization step.

///////////
//
// transformRetTempDef() will modify 'fn' to replace
// =(refArg, something)
// return(void)
// with:
// return(something)

struct ReturnByRefDef {
FnSymbol* fn; // the function that returns by ref
ArgSymbol* refArg; // the function's formal used for indirect return
bool skipLnFnArgs; // if true, refArg is fn's 3rd-last formal
};

static bool isGoodRefArg(FnSymbol* fn, ReturnByRefDef& info,
bool skippedLnFnArgs, Expr* argDef) {
ArgSymbol* refArg = toArgSymbol(toDefExpr(argDef)->sym);
if (!refArg->hasFlag(FLAG_RETARG))
return false; // give up; maybe look harder?
INT_ASSERT(!strcmp(refArg->name, "_retArg"));
INT_ASSERT(fn->retType = dtVoid);

info = { fn, refArg, skippedLnFnArgs };
return true;
}

static bool acceptableDef(FnSymbol* fn, ReturnByRefDef& info) {
// isGoodRefArg() fills 'info'
if (! isGoodRefArg(fn, info, false, fn->formals.tail) &&
! (fn->numFormals() >= 3 &&
isGoodRefArg(fn, info, true, fn->formals.tail->prev->prev)) )
return false;

if (info.refArg->type->symbol->hasFlag(FLAG_STAR_TUPLE))
return false; // codegen requires star tuples to be passed by ref

for_SymbolSymExprs(refSE, info.refArg) {
if (CallExpr* call = toCallExpr(refSE->parentExpr)) {
if (call->isPrimitive(PRIM_NO_ALIAS_SET))
continue; // any of these are ok
if (! call->isPrimitive(PRIM_ASSIGN))
return false; // need more work to handle, ex., PRIM_SET_MEMBER
// transformRetTempDef() will assert that we have only one PRIM_ASSIGN
}
}

return true;
}

static void transformRetTempDef(ReturnByRefDef& info) {
SymExpr* refUse = nullptr;
for_SymbolSymExprs(refSE, info.refArg) {
if (CallExpr* call = toCallExpr(refSE->parentExpr))
if (call->isPrimitive(PRIM_NO_ALIAS_SET)) {
if (call->numActuals() == 1) call->remove();
else refSE->remove();
continue;
}
INT_ASSERT(refUse == nullptr); // expect only a single SE
refUse = refSE;
}

CallExpr* assignCall = toCallExpr(refUse->parentExpr);
Expr* refValueExpr = assignCall->get(2)->remove();
INT_ASSERT(isSymExpr(refValueExpr)); // ensure it can be used in 'return'
// Todo: do we need to add FLAG_RVV to refValueExpr->symbol()?
// At this point temps with FLAG_RVV may occur anywhere due to inlining.
assignCall->remove();

CallExpr* returnCall = toCallExpr(info.fn->body->body.tail);
INT_ASSERT(returnCall->isPrimitive(PRIM_RETURN));
INT_ASSERT(toSymExpr(returnCall->get(1))->symbol() == gVoid);
returnCall->get(1)->replace(refValueExpr);

info.refArg->defPoint->remove();
info.fn->retType = info.refArg->type;
info.fn->removeFlag(FLAG_FN_RETARG);
}

///////////
//
// transformRetTempUse() will replace:
// call fn(args, ret_tmp)
// with:
// move(ret_tmp, call fn(args))
//
// collapseTrivialMoves() will reduce to the original 'call_temp = fn(...)'

struct ReturnByRefUse {
SymExpr* fnSE; // 'fn'
SymExpr* tempSE; // 'ret_tmp'
};

// Return true if the desired pattern is present.
static bool acceptableUse(ReturnByRefDef& defInfo, SymExpr* fnUse,
ReturnByRefUse& useInfo) {
CallExpr* call = toCallExpr(fnUse->parentExpr);
if (call == nullptr || call->resolvedFunction() != defInfo.fn)
return false;

INT_ASSERT(call == call->getStmtExpr());
SymExpr* tempSE = toSymExpr(defInfo.skipLnFnArgs ?
call->argList.tail->prev->prev : call->argList.tail);
// the temp is usually called "ret_tmp", however not necessarily
INT_ASSERT(tempSE->symbol()->type == defInfo.refArg->type); // fyi

useInfo = { fnUse, tempSE };
return true;
}

static void transformRetTempUse(ReturnByRefUse& info) {
Expr* fnCall = info.fnSE->parentExpr;
Symbol* retTemp = info.tempSE->symbol();
INT_ASSERT(! retTemp->type->symbol->hasEitherFlag(FLAG_REF, FLAG_WIDE_REF));

// replace:
// call fn(args, ret_tmp)
// with:
// move(ret_tmp, call fn(args)) // if ret_tmp is a ref, make it assign
//
SET_LINENO(fnCall);

Expr* anchor = fnCall->prev;
BlockStmt* encl = anchor ? nullptr : toBlockStmt(fnCall->parentExpr);
Expr* move = new CallExpr(retTemp->isRef() ? PRIM_ASSIGN : PRIM_MOVE,
info.tempSE->remove(), fnCall->remove());
if (anchor) anchor->insertAfter(move);
else encl->insertAtHead(move);
}

///////////
// putting it all together

static void undoReturnByRef(FnSymbol* fn) {
if (fn->hasFlag(FLAG_VIRTUAL)) return; // skip these for now

ReturnByRefDef defInfo;
if (! acceptableDef(fn, defInfo))
return;

// Make changes only if we can handle all uses of 'fn'.
// While checking, store some findings for later use.
std::vector<ReturnByRefUse> infos;
for_SymbolSymExprs(use, fn) {
ReturnByRefUse useInfo;
if (acceptableUse(defInfo, use, useInfo))
infos.push_back(useInfo);
else
return;
}

for (ReturnByRefUse& info: infos)
transformRetTempUse(info);

transformRetTempDef(defInfo);
}

/////////////////////////////////////////////////////////////////////////////
// collapseTrivialMoves() converts:
// move(source, expr) // move1
// move(dest, source) // move2
// provided:
// 'move1' and 'move2' are adjacent or have only DefExprs in between
// 'source' has no other references
// to:
// move(dest, expr)

// In some cases codegen adds an appropriate dereference, widening, etc.
// for a symbol-to-symbol move and not for a call-to-symbol move.
// So collapse only moves for which such additions are not needed.
static bool okSymbol(Symbol* sym) {
return sym->qual == QUAL_VAL || sym->qual == QUAL_CONST_VAL;
}

static bool canCollapseMoveBetween(SymExpr* dest, SymExpr* source) {
Symbol *destSym = dest->symbol(), *sourceSym = source->symbol();
return
destSym->type == sourceSym->type &&
! sourceSym->hasFlag(FLAG_CONFIG) &&
! sourceSym->hasFlag(FLAG_EXPORT) &&
! sourceSym->hasFlag(FLAG_EXTERN) &&
okSymbol(destSym) && okSymbol(sourceSym);
}

static bool closeEnough(Expr* move1, Expr* move2) {
Expr* curr = move1;
for (int dist = 0; dist < 5; dist++) { // heuristically allow <=5 defs
curr = curr->next;
if (curr == move2) return true;
if (!curr || ! isDefExpr(curr)) return false;
}
return false;
}

// Returns 'move1', or NULL if the desired pattern is not present.
// 'sourceSE' is the SymExpr for 'source' in 'move2'.
static CallExpr* singleMoveTo(CallExpr* move2, SymExpr* sourceSE) {
Symbol* source = sourceSE->symbol();
SymExpr* otherSE = source->firstSymExpr();
// Specialize for exactly two references to 'source'.
if (otherSE == sourceSE) {
// need source -> sourceSE -> otherSE -> NULL
otherSE = otherSE->symbolSymExprsNext;
if (otherSE == nullptr || otherSE->symbolSymExprsNext != nullptr)
return nullptr; // 1 or >2 references
}
else {
// need source -> otherSE -> sourceSE -> NULL
if (otherSE->symbolSymExprsNext != sourceSE ||
sourceSE->symbolSymExprsNext != nullptr)
return nullptr; // >2 references
}

CallExpr* move1 = toCallExpr(otherSE->parentExpr);
if (move1 != nullptr &&
otherSE == move1->get(1) &&
move1->isPrimitive(PRIM_MOVE) &&
(move2 == move1->next || closeEnough(move1, move2)) )
return move1;
else
return nullptr;
}

static void collapseTrivialMoves() {
// Empirically, running collapseTrivialMoves() the second time
// would not result in any additional removals. So, do it just once.
//
// This work could be done on a per-function basis using collectCallExprs().
// Simply traversing 'gCallExprs' avoids the overhead of collectCallExprs().
for_alive_in_Vec(CallExpr, move2, gCallExprs) {
if (move2->isPrimitive(PRIM_MOVE))
if (SymExpr* dest = toSymExpr(move2->get(1)))
while (true) {
if (SymExpr* source = toSymExpr(move2->get(2)))
if (canCollapseMoveBetween(dest, source))
if (CallExpr* move1 = singleMoveTo(move2, source)) {
move1->remove();
source->replace(move1->get(2)->remove());
source->symbol()->defPoint->remove();
continue; // will check for another move1 to reduce with move2
}
break; // no further opportunities with move2
} // while
} // for
}
1 change: 1 addition & 0 deletions compiler/resolution/interfaceResolution.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1334,6 +1334,7 @@ static FnSymbol* finalizeHolder(ImplementsStmt* istm, FnSymbol* reqFn,
wrapper->insertAtTail("'return'(%S)", gVoid);
} else {
VarSymbol* retTemp = newTemp("ret", wrapper->retType);
retTemp->addFlag(FLAG_RVV);
// If this is violated, need to handle that case.
INT_ASSERT(call->parentExpr == wrapper->body);
call->insertBefore(new DefExpr(retTemp));
Expand Down
11 changes: 7 additions & 4 deletions doc/rst/usingchapel/debugging.rst
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ executable. This can be done in two steps.

.. code-block:: bash
chpl -g --target-compiler=gnu --savec <dir> --preserve-inlined-line-numbers --no-munge-user-idents <source_file>
chpl -g --target-compiler=gnu --savec <dir> --preserve-inlined-line-numbers --no-munge-user-idents --no-return-by-ref --no-inline <source_file>
For more details on these settings, read the rest of this section.

Expand All @@ -101,7 +101,7 @@ Building the Application
~~~~~~~~~~~~~~~~~~~~~~~~

The following flags can be useful for making the generated C more amenable to
debugging.
debugging. Any of them can be omitted as desired.

=================================== =========================================
Flag Description
Expand All @@ -113,13 +113,16 @@ debugging.
``--preserve-inlined-line-numbers`` When code gets inlined (e.g. replacing a
function call with the function body)
maintain the filename and line number
information of the original function
call.
information of the original function.
``--no-munge-user-idents`` Don't munge user identifiers (e.g.
variable or function names). Munging
typically prevents conflicts with
identifiers in external code but makes
debugging harder.
``--no-return-by-ref`` Don't use an extra reference argument
when compiling a Chapel function that
returns a record.
``--no-inline`` Avoid inlining in many cases.
=================================== =========================================

Notes on munging
Expand Down
1 change: 1 addition & 0 deletions test/compflags/returnByRef/ssca2/COMPOPTS
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
--no-return-by-ref
1 change: 1 addition & 0 deletions test/compflags/returnByRef/ssca2/SSCA2_Modules
1 change: 1 addition & 0 deletions test/compflags/returnByRef/ssca2/SSCA2_main.1Dtorus.good
1 change: 1 addition & 0 deletions test/compflags/returnByRef/ssca2/SSCA2_main.2Dtorus.good
Loading

0 comments on commit b8c0d12

Please sign in to comment.