Yi Li bio photo

Yi Li

Associate Professor

College of Computing and Data Science (CCDS)
Nanyang Technological University (NTU)

Address: Block S3-01c-104
50 Nanyang Avenue, Singapore 639798
Phone: +65 6790 4287

Email Twitter LinkedIn GitHub Bitbucket Google Scholar ORCID

Intent Reverse Engineering for Smart Contracts

tl;dr: In the AI era, intent is becoming more important than code. Yet, for most software systems—including smart contracts—intent is rarely specified explicitly. Can we recover it from the artifacts that developers leave behind? Smart contracts provide a uniquely rich environment for this challenge because their code, execution histories, governance discussions, and audit reports are often publicly available. This post explores the emerging idea of intent reverse engineering: recovering formalized intent from these artifacts and transforming it into machine-readable specifications. Such capabilities may become a key building block for verification, auditing, maintenance, and AI-assisted software development.

disclaimer: this post is not meant to be a complete survey of the literature, rather to develop a conceptual framework that may help us think about future research directions.

Background

In the age of AI coding, intent is the new source code. Software development is rapidly shifting from code-centric to intent-centric workflows, where developers increasingly specify what they want and AI systems determine how to implement it. As a result, the central challenge is no longer writing code, but communicating intent precisely.

This shift exposes what recent work refers to as the intent gap [1]: the disconnect between human intentions and actual program behaviors. Natural-language requirements are often ambiguous, incomplete, or underspecified, allowing multiple plausible implementations that may satisfy the written description while violating the developer’s true intent. Formal specifications offer a promising way to bridge this gap by providing precise, machine-interpretable representations of intended behavior.

Unfortunately, writing formal specifications remains difficult and expensive. This is a well-known and long-standing challenge in practice: formal methods require specialized expertise, significant effort, and a level of rigor that many development teams cannot afford. As a result, most software systems are built without explicit formal specifications. Instead, their intended behavior is implicitly encoded across source code, documentation, design documents, issue trackers, test suites, code reviews, and developer discussions. Over time, as systems evolve and contributors change, this intent becomes increasingly fragmented and difficult to recover.

Why Reverse Engineering?

This motivates the problem of Intent Reverse Engineering: automatically recovering formalised intents from existing software systems and their surrounding artifacts. Rather than requiring developers to write specifications from scratch, intent reverse engineering techniques aim to infer behavioral constraints, invariants, domain assumptions, and other semantic properties directly from existing implementations, execution traces, documentation, and historical development records. In the AI era, such inferred specifications can serve as machine-readable representations of intent, enabling more reliable code generation, verification, testing, and maintenance.

The idea is not entirely new. Prior work has explored many forms of specification recovery, including dynamic invariant detection (e.g., Daikon), static contract inference (e.g., Houdini), API specification mining, protocol inference, and specification synthesis techniques that derive behavioral models from code and executions. These approaches have demonstrated that useful specifications can often be recovered automatically (or semi-automatically), even when no formal specification was originally available.

Intent reverse engineering, however, aims at a broader target. First, the available evidence extends beyond source code and execution traces to include documentation, commit histories, design discussions, code reviews, issue trackers, and other development artifacts. Second, the recovered intent is not limited to low-level program properties such as preconditions, postconditions, and invariants. Instead, we seek richer semantic structures, including domain models, business rules, security assumptions, governance policies, protocol constraints, and architectural decisions. The goal is not merely to describe what the software does, but to reconstruct why it was built that way in the first place.

Why Smart Contracts?

Smart contracts provide a particularly compelling setting for intent reverse engineering. Unlike traditional software, smart contracts often manage valuable digital assets, enforce governance mechanisms, and serve as the foundational infrastructure of decentralized applications. Errors in their behavior can lead to irreversible financial losses, making precise understanding of intent especially important.

At the same time, smart contracts are unusually transparent as their decentralized nature requires. Source code, deployment artifacts, transaction histories, audit reports, governance proposals, and public discussions are frequently available to anyone. This creates a rich ecosystem of heterogeneous artifacts that collectively encode developer intent. The challenge is not the absence of information, but rather the difficulty of integrating these sources into a coherent and machine-interpretable specification.

Another important reason is that smart contracts are relatively small and self-contained compared to most traditional software systems. Their state spaces and control flows are often simpler, and their behavior is typically expressed through explicit business logic rather than complex interactions with operating systems, user interfaces, or large external dependencies. As a result, many formal techniques that struggle to scale to conventional software become practical for smart contracts. Verification, invariant inference, symbolic reasoning, and specification mining can often be applied directly to real-world contracts, making the domain an attractive testbed for intent reverse engineering.

A Roadmap for Recovering Intent

In this post, I summarize our past work that can be viewed through the lens of intent reverse engineering for smart contracts and discuss several promising directions for future research.

The intent reverse engineering process is fundamentally about transforming a collection of heterogeneous artifacts into a representation of intent, which can be viewed as a three-stage pipeline:

Intent Sources → Intent Inference → Intent Representations

The following table organizes existing efforts in this space along two dimensions. The first dimension, intent sources, concerns where developer intent is encoded. Unlike traditional software systems, smart contracts are accompanied by a rich ecosystem of publicly available artifacts, ranging from source code and transaction histories to white papers, governance records, and audit reports. The second dimension, intent representations, concerns the form in which recovered intent is expressed. These representations span multiple levels of abstraction, from low-level behavioral specifications such as preconditions, postconditions, and invariants, to higher-level concepts such as security properties, and business rules. Together, these dimensions provide a useful framework for understanding both existing approaches and future opportunities in intent reverse engineering.

Dimension Category Examples
Intent Sources Code-Centric Source code, bytecode, CFGs, storage layouts, tests, user interfaces [2]
  Execution-Centric Transaction histories, execution traces, event logs [3], [4], [5], [6]
  Documentation-Centric Whitepapers, technical standards (e.g., ERC)
  Evolution-Centric Commits, upgrade proposals (e.g., EIP), issue trackers [7]
  Community-Centric Audit reports [8]
Intent Representations Behavioral Specifications Preconditions, postconditions, invariants [3], [4]
  State Machines Protocol states and transitions [9]
  Security Properties Authorization rules, role structures, security policies, trust assumptions [10]
  Business Rules Governance structures, incentive mechanisms [11], tokenomics, business logics [12], game rules [13]

Case Studies

To make these ideas more concrete, let us examine several examples of how intent can be recovered from different artifacts and represented in different forms. While these works were not originally framed as intent reverse engineering, they can be naturally viewed as concrete instantiations of the roadmap described above.

Case Study 1: Recovering Behavioral Specifications from Execution Histories

Among the various sources of intent discussed earlier, execution histories provide perhaps the most direct evidence of how a smart contract is expected to behave in practice. Every successful transaction reflects a sequence of actions that satisfied the contract’s checks, respected its business logic, and produced outcomes accepted by both users and the blockchain. Collectively, these transactions encode a rich record of the behavioral assumptions under which the contract operates.

This observation motivates a simple but powerful idea: if intent is not explicitly documented, perhaps it can be inferred from observed behavior. Rather than asking developers to write specifications manually, we can analyze historical transactions and execution traces to recover behavioral properties that appear to hold consistently during contract operation. Such properties may reveal implicit preconditions, postconditions, state invariants, and relationships among contract variables, providing a machine-readable approximation of the contract’s intended behavior.

Along this line, we developed InvCon [4] and its successor InvCon+ [3], which infer behavioral specifications from transaction histories and execution traces. The recovered specifications capture properties such as preconditions, postconditions, and state invariants, transforming implicit behavioral assumptions into explicit and machine-readable representations (see examples below).

Common ERC-20 Invariants
Common ERC-20 invariants inferred by InvCon+

InvCon adopts a dynamic invariant detection approach similar to Daikon, mining likely invariants from observed executions based on predefined templates. InvCon+ extends this approach by combining dynamic inference with static analysis and formal verification, allowing candidate invariants to be validated against the contract implementation and enabling the discovery of specifications beyond those directly observable from historical traces.

Viewed through the lens of intent reverse engineering, these systems recover behavioral specifications from execution-centric artifacts. The inferred invariants provide a machine-readable approximation of the assumptions that developers expect to hold throughout contract execution.

An important question, however, is whether such recovered specifications are actually useful. In our subsequent work [5], we conducted the first systematic study of invariant effectiveness against real-world DeFi attacks. The results show that a substantial fraction of attacks manifest as violations of inferred invariants, suggesting that these specifications capture meaningful aspects of protocol intent. Beyond their role in documentation and program understanding, recovered behavioral specifications can therefore serve as practical security monitors for detecting deviations from intended behavior.

Case Study 2: Recovering Domain Models to Support Security Intent

Behavioral specifications such as invariants and preconditions capture how a contract behaves. However, many important forms of intent cannot be expressed solely in terms of state variables and transaction effects. Security requirements often depend on the broader domain context in which a contract operates. Questions such as “what roles exist within the system?”, “who is allowed to perform a particular action?”, and “what constitutes a valid game move?” require an understanding of the domain model underlying the contract.

This observation motivates a second form of intent reverse engineering: recovering the semantic structures that define the environment in which contract behaviors should be interpreted. Rather than directly inferring behavioral properties, the goal is to reconstruct concepts such as role hierarchies, authorization relationships, protocol participants, and game rules. These domain models provide the foundation upon which higher-level security and business specifications can be expressed.

Our work on SpCon [10] illustrates this idea. Access-control vulnerabilities remain one of the most common causes of smart contract exploits, yet authorization policies are rarely documented explicitly. Instead, role relationships are implicitly encoded through permission checks scattered across contract functions. SpCon analyzes contract code and benign past transactions to recover a role-based model of the system, identifying privileged entities, protected operations, and the relationships between them. Once this role structure is reconstructed, high-level authorization specifications can be formulated and checked automatically. For example, the system can reason about whether sensitive operations are restricted to intended administrators, whether privilege escalation paths exist, or whether different roles are granted inconsistent permissions. In this sense, the recovered role model serves as a machine-readable representation of the contract’s security intent.

A similar perspective appears in our work on specification mining for smart contracts [13], [9]. Many contracts implement application-specific protocols whose correctness depends on domain-specific rules rather than generic invariants. Consider blockchain-based games, auctions, or governance systems. Properties such as a player may only move after joining the game, a winner can only be declared after the game ends, or a bid must exceed the current highest bid are meaningful only when interpreted within the protocol’s state machine and participant model. To recover such specifications, we analyze execution traces using trace slicing and predicate abstraction techniques, extracting higher-level behavioral rules that describe interactions among participants and protocol states (see an example below). The resulting specifications capture not merely low-level program behavior, but the game rules and operational semantics that define the application’s intended functionality.

Game Rules Mined from Dicether
Game rules mined from an Ethereum dice game called Dicether

Viewed through the lens of intent reverse engineering, both systems recover domain models from code-centric and execution-centric artifacts. These models act as an intermediate layer between implementation details and high-level intent. Rather than directly inferring security policies or business rules, they reconstruct the semantic vocabulary needed to express such concepts in the first place. Once the role structures, protocol states, and interaction patterns are recovered, richer specifications can be formulated, verified, and monitored automatically.

More broadly, these examples suggest that intent recovery may benefit from a hierarchical process. Before recovering high-level intent, we may first need to recover the conceptual models that developers implicitly assume when designing the system. In many cases, understanding the domain may be a prerequisite for understanding the intent itself.

AI’s Role in Intent Recovery

Most of the techniques discussed so far rely on structured artifacts such as source code, execution traces, and transaction histories. While these artifacts provide valuable signals about program behavior, they represent only a fraction of the information that developers and communities use to communicate intent. Much of a protocol’s rationale, governance structure, economic assumptions, and security expectations are documented in natural-language artifacts such as whitepapers, governance proposals, audit reports, technical discussions, and community forums.

Historically, these sources have been difficult to utilize systematically. Unlike program code, natural-language documents are often ambiguous, incomplete, and highly domain-specific. As a result, traditional specification mining techniques have largely focused on artifacts that can be analyzed using static analysis, dynamic analysis, or formal reasoning.

Recent advances in large language models fundamentally change this landscape. Modern AI systems possess strong capabilities in code understanding, natural-language comprehension, information extraction, summarization, and cross-document reasoning. These capabilities make it possible to recover intent from sources that were previously inaccessible to automated analysis, significantly broadening the scope of intent reverse engineering.

One example comes from our work on governance analysis in decentralized finance [11]. Governance mechanisms define many of the most important business rules of a protocol, including voting rights, ownership structures, reward distributions, and decision-making procedures. Such rules are often described primarily in whitepapers and governance documents rather than encoded directly in smart contract logic. By leveraging large language models, we were able to automatically extract governance-related information from protocol documentation and reconstruct governance structures at scale. The resulting representations capture high-level organizational intent that would be difficult, if not impossible, to recover through code analysis alone.

Beyond extracting intent from unstructured artifacts, AI also enables a second capability: intent transfer. Human developers frequently reuse design patterns, business models, and security mechanisms across projects. Consequently, many forms of intent are not unique to a single system but recur across families of similar applications. Rather than inferring specifications entirely from scratch, AI systems can leverage previously recovered knowledge and adapt it to new contexts.

This idea is illustrated by our work on PropertyGPT [8]. The key observation is that security auditors routinely write high-quality formal specifications when verifying smart contracts. These specifications encode valuable expert knowledge about common business rules, security assumptions, and protocol behaviors. PropertyGPT retrieves specifications from previously audited contracts and uses large language models to adapt them to new contracts with similar structures and functionalities. In effect, the system treats existing specifications as reusable intent artifacts and performs intent transfer across related protocols.

Viewed through the lens of intent reverse engineering, AI is not merely another inference technique. Rather, it expands both the sources from which intent can be recovered and the mechanisms by which intent can be propagated. Instead of relying solely on structured program artifacts, future systems may synthesize evidence from code, executions, documentation, governance discussions, audit reports, and historical specifications. Likewise, instead of recovering intent independently for every project, they may continuously accumulate, refine, and transfer intent knowledge across entire software ecosystems.

This shift suggests a broader vision for the future. If traditional specification mining can be viewed as recovering intent from a single artifact, AI-driven intent recovery may ultimately become a knowledge-centric process that integrates information across heterogeneous sources and reuses intent across related systems. Such capabilities could transform intent from a scarce and manually produced resource into a reusable asset that evolves alongside software itself.

Future Directions

The vision of intent reverse engineering is still in its early stages. Looking ahead, several open research questions appear particularly promising.

1. Multi-Level and Multi-Source Intent Fusion

Current approaches typically recover fragments of intent from individual artifacts, such as source code, execution traces, documentation, or governance discussions. At the same time, intent exists at multiple levels of abstraction, ranging from function-level behavioral contracts to high-level business objectives. A key challenge is to integrate intent across both different sources and different abstraction levels into a coherent and unified specification. This raises fundamental questions:

  • How should conflicting evidence be reconciled when different artifacts suggest different intentions?
  • How can low-level behavioral constraints be connected to high-level business goals?

Addressing these challenges will likely require new techniques for evidence integration, uncertainty modeling, and cross-level reasoning about intent.

2. Human-in-the-Loop Intent Recovery

Automatically recovered intent may not always be accurate or complete. In many cases, different artifacts may provide ambiguous or even conflicting signals, and purely automated systems may struggle to resolve these discrepancies. This suggests the need for human-in-the-loop approaches, where developers, auditors, or domain experts actively participate in the intent recovery process. Key questions include:

  • How can we design systems that effectively incorporate human feedback into intent inference?
  • How should recovered intent be presented to users to support validation, correction, and refinement?

Rather than fully automating intent recovery, future systems may combine automated inference with interactive tools that allow humans to guide, validate, and refine the recovered specifications.

3. Intent Evolution

Intent is not static. Smart contracts are upgraded, governance decisions modify protocol objectives, and communities redefine acceptable behavior over time. This perspective aligns with earlier work [14], which emphasizes the role of change intention in understanding how software evolves.This raises an important question:

  • How can we track the evolution of intent across protocol lifecycles?

Understanding intent drift and intent evolution remains largely unexplored and aligns naturally with the evolution-centric sources discussed earlier.

4. Intent-Centric Software Engineering

More broadly, intent reverse engineering points toward a larger shift in software engineering. As AI increasingly automates implementation, software development may become fundamentally centered on intent management rather than code production.

In such a world, intent reverse engineering is no longer merely a maintenance activity. It becomes an essential mechanism for recovering, validating, and evolving the specifications that govern software systems. Smart contracts, with their rich collection of public artifacts and strong correctness requirements, provide a unique opportunity to pioneer this transition toward intent-centric software engineering.

References

  1. Lahiri, S. K. (2026). Intent Formalization: A Grand Challenge for Reliable Coding in the Age of AI Agents. ArXiv Preprint ArXiv:2603.17150.
  2. Liu, Y., Li, X., & Li, Y. (2025, November). DeepTx: Real-Time Transaction Risk Analysis via Multi-Modal Features and LLM Reasoning. Proceedings of the 40th IEEE/ACM International Conference on Automated Software Engineering (ASE).
  3. Liu, Y., Zhang, C., & Li, Y. (2025). Automated Invariant Generation for Solidity Smart Contracts. IEEE Transactions on Dependable and Secure Computing.
  4. Liu, Y., & Li, Y. (2022). InvCon: A Dynamic Invariant Detector for Ethereum Smart Contracts. Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering (ASE), 1–4.
  5. Chen, Z., Liu, Y., Beillahi, S. M., Li, Y., & Long, F. (2024). Demystifying Invariant Effectiveness for Securing Smart Contracts. Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering (FSE), 1772–1795.
  6. Chen, Z., Liu, Y., Beillahi, S. M., Li, Y., & Long, F. (2024). OpenTracer: A Dynamic Transaction Trace Analyzer for Smart Contract Invariant Generation and Beyond. Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2399–2402.
  7. Liu, Y., Li, S., Wu, X., Li, Y., Chen, Z., & Lo, D. (2024). Demystifying the characteristics for smart contract upgrades. ArXiv Preprint ArXiv:2406.05712.
  8. Liu, Y., Xue, Y., Wu, D., Sun, Y., Li, Y., Shi, M., & Liu, Y. (2025, February). PropertyGPT: LLM-driven Formal Verification of Smart Contracts through Retrieval-Augmented Property Generation. Proceedings of 32nd Annual Network and Distributed System Security Symposium (NDSS).
  9. Liu, Y., Liu, Y., Li, Y., & Artho, C. (2025, March). Specification Mining for Smart Contracts with Trace Slicing and Predicate Abstraction. Proceedings of the 32nd IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER).
  10. Liu, Y., Li, Y., Lin, S.-W., & Artho, C. (2022). Finding Permission Bugs in Smart Contracts with Role Mining. Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), 716–727.
  11. Ma, W., Zhu, C., Liu, Y., Xie, X., & Li, Y. (2025). A Comprehensive Study of Governance Issues in Decentralized Finance Applications. ACM Transactions on Software Engineering and Methodology, 34(7), 1–31.
  12. Gao, J., Zhang, Z., Sun, Y., Liu, Y., Liu, C., Liu, H., Li, Y., & Liu, Y. (2026). LogicScan: An LLM-driven Framework for Detecting Business Logic Vulnerabilities in Smart Contracts. ArXiv Preprint ArXiv:2602.03271.
  13. Liu, Y., Li, Y., Lin, S.-W., & Zhao, R. (2020). Towards Automated Verification of Smart Contract Fairness. Proceedings of the 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (FSE), 666–677.
  14. Krüger, J., Li, Y., Zhu, C., Chechik, M., Berger, T., & Rubin, J. (2023). A Vision on Intentions in Software Engineering. Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (FSE), 2117–2121.