Nanus Compiler Fixes: Boost Stability & Error Handling

by Editorial Team 55 views
Iklan Headers

Introduction: Why Nanus Needs a Tune-Up

Hey everyone! Let's talk about Nanus, our super cool and often quite compact and readable compiler, which is a key player in our quest to bring Rivus to life. While Nanus is doing a stellar job in many areas, we've identified a few crucial spots where its lexer and parser contracts are, shall we say, a little loose. These aren't just minor quirks; we're talking about potential contract violations that could silently break correctness or lead to some seriously confusing failures down the line. Imagine building a magnificent bridge, but some of the foundational bolts aren't quite tightened – that's the kind of subtle but critical issue we're aiming to address. Specifically, these issues are going to make our planned port to Go a real headache if we don't sort them out now. The goal here isn't just a quick patch; it's about making Nanus a truly stable and robust subset compiler for Rivus, ensuring it's not just functional, but bulletproof. We want Nanus to be predictable, reliable, and a joy to work with, not a source of mystery bugs or head-scratching errors. This article dives deep into the highest-leverage fixes that will help us achieve that stability, transforming Nanus into the rock-solid foundation Rivus deserves. We're going to walk through each of these points, explaining the problem, the impact, and the proposed solutions, making sure everyone understands why these enhancements are so vital for our compiler's future. It's all about making Nanus as efficient and error-resistant as possible, laying down the groundwork for future development without stumbling over preventable issues. This isn't just about fixing bugs; it's about proactive engineering to build a stronger system from the ground up, ensuring a smoother journey for our codebase and our development team.

Critical Fixes: Stabilizing Nanus's Core

First up, let's tackle some of the most pressing issues that are absolutely P0, meaning they need our immediate attention because they can seriously mess things up right now. These are foundational problems that, if left unaddressed, will cause cascading failures and make any further development a nightmare. We’re talking about basic functionalities that need to be ironed out for Nanus to truly shine.

Decoding Generics: The Lexer/Parser Mismatch

Alright, guys, let's dive into a pretty significant hurdle with Nanus's generics handling: the current lexer/parser mismatch. This is a classic case where the left hand (the lexer) isn't talking to the right hand (the parser) effectively, leading to some serious confusion when it comes to sophisticated type expressions. Specifically, the fons/nanus/lexer.ts module currently does not tokenise the essential angle brackets, < and >, at all. Seriously, they're not recognized as OPERATORS or PUNCTUATORS. Think about that for a second: these characters are fundamental for defining generic types in modern programming languages, yet our lexer just breezes past them, acting like they don't exist. This is a huge oversight, as generics are becoming increasingly important for writing flexible and reusable code.

Now, here's where it gets really interesting, and frankly, a bit problematic. Over in fons/nanus/parser.ts, the parser absolutely expects these very same angle brackets! You'll find calls like match('Operator', '<') and expect('Operator', '>') scattered throughout crucial parsing functions such as parseVaria(), parseFunctio(), and parseTypusPrimary(). So, we have a situation where the lexer, which is responsible for breaking down the raw source code into meaningful tokens, completely ignores < and >, while the parser, which builds the abstract syntax tree from these tokens, is actively searching for them. This is a fundamental breakdown in the contract between the lexer and parser.

What's the current behavior when you try to use generic types in Nanus? Well, it's not pretty. Any attempt to use a generic type will immediately hit a wall. The lexer, upon encountering these unrecognized characters, will either trigger its "unexpected character" path – which is a sign of a deeper issue – or it will completely skip them, leading to utterly confusing parse failures. Imagine writing perfectly valid code with generics, only for the compiler to choke on it, giving you an obscure error message that makes no sense given your input. This isn't just frustrating; it undermines confidence in the compiler and makes debugging incredibly difficult. Developers will spend countless hours trying to figure out why their code isn't parsing correctly, only to find out it's a fundamental issue with how the compiler processes basic syntax elements. It's like trying to read a book where every time you see a comma, the book just deletes it and tells you the sentence structure is wrong – incredibly unhelpful and misleading. The impact is significant: it completely blocks the use of generics, severely limiting the expressiveness and utility of Nanus for any non-trivial projects that rely on generic programming paradigms. Our entire type system and type inference capabilities are held back by this singular oversight, making it a critical P0 issue that demands immediate attention.

So, what's the fix? It’s straightforward but essential: we need to properly add < and > tokenisation to the lexer. This involves deciding whether they should be classified as Operator or Punctuator. The most important part here is consistency: once we make that decision, we must ensure it’s applied uniformly across the entire parser. If the lexer produces an Operator token for <, then the parser must always expect an Operator token when it's looking for <. This seemingly small change will unlock the full potential of generics in Nanus, making our compiler much more powerful, flexible, and, most importantly, correct. It will enable developers to write robust, type-safe code using generics without fear of unexpected compilation errors, significantly enhancing the overall developer experience and the capability of the Nanus language. It’s a foundational step towards a more mature and reliable compiler.

Stopping Errors Early: Making Lexing Errors Fatal

Next up, let's talk about how Nanus handles errors at the lexing stage – specifically, why we need to make these errors fatal and properly formatted. Right now, if the fons/nanus/lexer.ts encounters an unknown or unexpected character, its default behavior is pretty forgiving, perhaps a little too forgiving. It will typically print an error message directly to stderr (you know, console.error(... unexpected character ...)), and then, crucially, it continues processing the input. Sounds nice and resilient, right? Well, in the context of a compiler, this "carry on regardless" attitude is actually a recipe for disaster, guys.

Imagine a scenario where the lexer hits an unexpected symbol. It flags it, prints a message, and then tries to move on. What happens next? The parser, which expects a clean stream of valid tokens, suddenly receives a corrupted or incomplete sequence. This doesn't just lead to one error; it can create a chain reaction of cascaded parse errors. One tiny mistake early on can cause dozens, if not hundreds, of subsequent errors to be reported, none of which truly point to the original problem. It's like finding a small crack in the foundation of a building, ignoring it, and then being surprised when the entire roof collapses – the initial problem was small, but its unaddressed nature led to catastrophic consequences. This isn't just about cosmetic issues; it can lead to the compiler producing wrong output, or even worse, silently incorrect output that goes unnoticed until runtime, leading to incredibly hard-to-debug problems in generated code. We want our compiler to fail fast and loudly when it encounters a show-stopping issue, not try to limp along and produce garbage. This "fail fast" principle is paramount in compiler design because it helps developers pinpoint the exact source of an error immediately, saving valuable debugging time and preventing faulty code from ever being generated.

The current system, where errors are dumped to stderr and compilation proceeds, makes it incredibly difficult for developers to understand what's truly going wrong. The error messages might be there, but they're often unformatted, lacking crucial context like filename, line number, and column. This absence of precise locus information means you have to manually hunt for the error in your source file, which is a massive productivity drain. We need a system that clearly and precisely points to the exact spot where the problem occurred, just like a seasoned detective pointing directly to the scene of the crime.

So, what’s the fix here? Simple: we need to transition from printing to stderr and continuing, to throwing a structured error. The ideal candidate for this is our existing CompileError (from fons/nanus/errors.ts). By throwing a CompileError, we can embed all the necessary locus information (filename, line, column, and even a snippet of the problematic source code) directly into the error object. This allows our formatError() utility to truly shine. Instead of a generic console.error, we get a beautifully formatted, precise error message that tells the developer exactly where the problem is, what the problem is, and why it's a problem. This means an immediate halt to compilation upon the first severe lexing error, preventing cascades and ensuring that developers are presented with clear, actionable feedback. This change significantly improves the developer experience by making compiler errors not just visible, but understandable and actionable. It’s a fundamental shift towards a more robust and user-friendly compiler, where errors are allies in the debugging process, not confusing obstacles.

Refining Consistency: Smoothing Out Nanus's Edges

Now that we’ve covered the P0 issues, let's move onto the P1 fixes. These are crucial for improving the consistency and predictability of Nanus, making it a much more pleasant and reliable tool to work with. These might not bring the compiler to a grinding halt like P0 issues, but they introduce subtle bugs, unexpected behaviors, and increase the cognitive load for anyone trying to understand or extend Nanus. It’s all about polishing the rough edges and ensuring a smooth, logical flow throughout the compiler’s operations.

Clean Termination: Fixing Inconsistent Statement Rules

Let's dive into an area that often causes subtle headaches in compiler design: statement termination rules. In Nanus, we currently have an inconsistency that needs sorting out for better predictability and cleaner code. Here’s the deal, guys: the prepare() function in fons/nanus/lexer.ts is designed to drop Newline tokens. This means that by the time the token stream reaches the parser, those explicit newline characters that separate statements in source code are already gone. Furthermore, the expectNewline() function in fons/nanus/parser.ts, which you might expect to enforce statement termination, is effectively a no-op. It literally does nothing. It's like having a traffic cop that just waves everyone through regardless of the light; it’s there, but it doesn’t enforce anything.

Now, here's the kicker: despite newlines being dropped and expectNewline() being a no-op, many statement parsers still call expectNewline() to terminate statements. You'll see this in functions like parseVaria, parseExpressiaStmt, and others. This creates a really confusing situation. On one hand, the design suggests newlines aren't significant for statement termination because they're removed early. On the other hand, the parser explicitly tries to "expect" them. This leads to inconsistent statement termination behavior. If a developer expects a newline to implicitly terminate a statement, but the parser's logic for expectNewline() doesn't actually do anything, they might find their code silently failing or merging statements in unexpected ways. This kind of ambiguity leads to fragile parsing logic, making it harder to reason about how the compiler will interpret source code, and significantly complicates future changes or extensions to the grammar. It's a prime example of a contract violation between what the parser says it does (by calling expectNewline()) and what it actually does (nothing). This lack of a clear, consistent rule makes the language harder to use and the compiler harder to maintain. We need to define a single, unambiguous policy for how statements are terminated.

The fix boils down to choosing a single, clear rule and sticking to it. We have two main paths we can take, and both are valid, but we must pick one and implement it consistently. The first option is to either keep newlines and use them as terminators in a small set of contexts. This means the lexer would not drop newlines, and expectNewline() would become an actual, meaningful function that asserts the presence of a newline token. This approach makes newlines syntactically significant, similar to how semicolons work in some languages, acting as explicit statement delimiters. The second option is to make termination purely structural. This means newlines remain insignificant for termination, and we would enforce statement boundaries by checking for known delimiters. Think about languages that use braces ({}), parentheses (()), or keywords (end, else) to define the extent of a statement or block. In this scenario, expectNewline() would be entirely removed or redefined to purely handle whitespace, and statement parsers would instead check for the presence of a closing brace (}), EOF (end of file), or other structural markers to determine where a statement ends. This approach simplifies the lexer by keeping newlines out of the token stream, but shifts the responsibility for termination entirely to the parser's structural understanding of the code.

Regardless of which path we choose, the critical part is to eliminate the current inconsistency. This will lead to much more deterministic and understandable statement termination, making Nanus more robust, easier to parse, and ultimately, a more reliable compiler for everyone involved. It simplifies the grammar, reduces cognitive load for developers, and makes the compiler's behavior predictable, which is exactly what we want from a high-quality compiler.

Error Unification: A Single Strategy for All Errors

Alright, let's talk about something super important for a smooth development experience: error handling. Right now, guys, Nanus has a bit of a chaotic approach to reporting issues, and it’s creating unnecessary friction. We're seeing a fragmented error strategy that makes debugging and maintaining the compiler much harder than it needs to be. This needs to change.

Currently, if you look at the parser, it tends to throw generic new Error("line:col: msg") strings. These errors are then expected to be caught and processed by our formatError() utility, which relies on regex parsing to extract the line, column, and message from these ad-hoc strings. While formatError() tries its best to make sense of these, relying on regex for error parsing is inherently brittle. If someone changes the error string format slightly, formatError() breaks, and suddenly all our error messages become unreadable or misleading. This is not a robust system; it’s a hacky workaround that invites future problems.

Compounding this, as we discussed earlier, the lexer sometimes bypasses structured error handling entirely and just prints directly to stderr. This means some errors are captured and potentially formatted, while others are just dumped unceremoniously, leading to a very inconsistent user experience. A developer might see a nicely formatted error for a parse issue, then a raw, unformatted console.error for a lexing problem, which is confusing and unprofessional.

And here’s the kicker: we actually have a dedicated error class, CompileError, defined in fons/nanus/errors.ts. This class is specifically designed to hold all the crucial locus information (filename, line, column) and the actual error message in a structured, programmatic way. But, guess what? It isn’t consistently used across the lexer and parser. It’s like having a perfectly designed emergency exit sign but still having people run around looking for a way out because nobody bothered to point them to it. This inconsistency is a major pain point because it means we’re not leveraging our own best practices, leading to a patchwork of error reporting mechanisms that are difficult to manage and even harder to debug. The lack of a unified error shape means that any tool or process trying to consume errors from Nanus has to deal with multiple formats, increasing complexity and potential for breakage. It makes automated error reporting, IDE integrations, and even just simple command-line output far less effective. This fragmentation directly impacts developer productivity and the perceived quality of the compiler.

So, the fix is clear and crucial: we need to standardise on one error shape. The obvious choice is our existing, purpose-built CompileError. We should ensure that both the lexer and the parser consistently throw instances of CompileError, populated with all the relevant locus (filename, line, column) information. This means no more generic Error strings relying on regex parsing, and no more direct prints to stderr. Every error originating from the Nanus compilation process should be a CompileError. By doing this, we can firmly establish formatError() as the sole presentation layer for all compiler errors. It will receive a well-defined CompileError object and can then render it beautifully and consistently, providing a clear filename, line number, column, and even a snippet of the problematic source code. This standardization will massively improve the diagnostics and user experience for Nanus. It means developers will always receive clear, actionable, and consistently formatted error messages, making debugging much faster and less frustrating. It's a critical step towards a more professional and user-friendly compiler.

Polishing the Details: Minor Improvements for Robustness

Now let’s move on to some of the P2 issues. These might seem smaller in isolation, but collectively, they contribute significantly to the overall robustness, maintainability, and future-proofing of Nanus. Addressing these details ensures that our compiler is not just functional, but also clean, efficient, and ready for whatever comes next, especially as we consider porting to Go. These are the kinds of optimizations that, while not immediately show-stopping, pay huge dividends in the long run by preventing technical debt and streamlining the development process.

Naming Rights: Handling Keywords as Identifiers

Alright, let's talk about a subtly tricky area that often catches developers off guard: keyword-as-identifier policy. This is about how Nanus decides if a word is a special keyword (like if, for, function) or a plain old identifier (like a variable name myVar or a function name calculate). The issue here, guys, is that our parser currently has an inconsistent policy.

Sometimes, the parser seems to be pretty relaxed, allowing what looks like a keyword to be used as an identifier via functions like expectName(). This can be a double-edged sword. While it might seem flexible, it can lead to ambiguity. For example, if var is a keyword to declare variables, but you could also name a variable var, how does the parser tell the difference? This kind of relaxed parsing can make the grammar more complex and prone to misinterpretation. On the flip side, other contexts strictly require an Identifier token only. A great example of this is in import statements, where the names being imported absolutely must be identifiers and not keywords. This mixed approach leads to confusion for both developers trying to write code in Nanus and for those maintaining the compiler itself. It’s not clear when a keyword is truly a keyword and when it can be repurposed as a name. This ambiguity in the grammar rules can lead to unexpected parsing errors or, even worse, silently incorrect interpretations of source code.

But here's where the problem really bites us: the emitter outputs identifiers verbatim. What does this mean? It means if Nanus allows you to use a name like interface or delete as an identifier (perhaps because it's allowed in Nanus's syntax), and then the emitter simply prints that name directly into the target language (like TypeScript), we have a huge problem. These are often reserved words in TypeScript (or JavaScript), and using them as identifiers will result in invalid TS output. Your generated code will immediately fail to compile in TypeScript, forcing developers to manually go in and fix the output or restructure their Nanus code to avoid these clashes. This creates a significant portability and compatibility issue, making the generated code less reliable and requiring manual intervention, which defeats the purpose of automatic code generation. It’s like speaking two different languages but using the same word for entirely different concepts – confusion is guaranteed. We want our generated code to be valid and clean, not something that requires post-processing or manual fixes.

So, the fix involves making a clear, end-to-end policy decision for how keywords and identifiers interact. We basically have two main options, and we need to choose one and apply it consistently across the entire compiler pipeline, from lexer to parser to emitter. The first option is to disallow names that would break TypeScript at parse time. This means the parser would have a strict list of reserved words (including TS reserved words) that cannot be used as identifiers in Nanus. If a developer tries to use interface as a variable name, the Nanus parser would immediately flag it as an error, preventing the problem from ever reaching the emitter. This approach shifts the burden of correctness to the Nanus source code itself, ensuring that any code written in Nanus is inherently compatible with its target environment. The second option is to implement escaping or renaming rules in the emitter. Under this strategy, the Nanus parser would be more permissive, allowing certain keywords to be used as identifiers within Nanus. However, when the emitter translates this code to TypeScript, it would detect these problematic identifiers and automatically escape or rename them. For example, interface might become _interface or nanus_interface in the generated TypeScript. This approach allows more flexibility in Nanus's syntax but adds complexity to the emitter, requiring careful implementation of robust renaming strategies to avoid name collisions.

Regardless of the chosen path, the critical outcome must be that the generated output is always valid and compilable in the target language. This policy clarification will eliminate a major source of frustration and manual rework, making Nanus a much more reliable and professional tool for code generation, and ultimately enhancing the overall developer experience. It ensures that the contract between Nanus and its target language (TypeScript) is always upheld.

Emitter Tweaks: Small Fixes, Big Impact

Finally, let's zoom in on a couple of minor but important issues within the fons/nanus/emitter.ts module. These are the kinds of details that, while not catastrophic, represent areas of inefficiency or potential for drift that we should address to maintain a clean and robust codebase. Sometimes, the smallest fixes can have a big impact on long-term maintainability and performance, preventing subtle bugs from creeping in.

First up, in the Pactum (contract) generation logic, the emitter currently computes an unused async string. If you look closely, you'll see logic creating this async keyword, but then it's simply discarded or never actually applied where it might be expected. This is problematic for a few reasons. Not only is it dead code, consuming processing cycles for no benefit, but more importantly, TypeScript interfaces (which Pactum likely translates to or influences) can't have async methods anyway. Interfaces define the shape of an object and its methods, not their implementation details or execution context. So, generating or even considering async for interface methods is fundamentally mismatched with TypeScript's type system. This indicates a slight misunderstanding or an outdated assumption in the emitter's logic regarding how Pactum should translate to TypeScript. The existence of this unused computation signals an inefficiency and a potential source of confusion for anyone trying to understand or modify the emitter in the future. It’s a small, precise example of wasted effort and a contract violation with the target language’s capabilities. We want our emitter to be lean, mean, and perfectly aligned with the target language's specifications, ensuring that every piece of generated code is both necessary and valid.

The second minor issue relates to the METHOD_MAP. This METHOD_MAP is currently hardcoded within the emitter. While this might seem convenient initially, it introduces a significant risk of drift from norma mappings in the main pipeline. norma likely represents the canonical definitions or a central source of truth for how methods should be mapped or transformed. If METHOD_MAP is hardcoded in the emitter and norma changes its mappings, the emitter won't automatically pick up those changes. This can lead to subtle but dangerous inconsistencies: Nanus might emit code that relies on an outdated or incorrect method mapping, leading to runtime errors or unexpected behavior in the generated code. It breaks the "single source of truth" principle, where norma should ideally dictate such mappings. Hardcoding these values means that every time norma is updated, someone has to remember to manually update METHOD_MAP in the emitter, which is prone to human error and adds unnecessary maintenance overhead. We want our compiler to be robust and self-consistent across all its modules.

The fix for the unused async string is straightforward: remove the computation entirely. It's dead code, doesn't serve a purpose, and trying to apply async to interface methods is semantically incorrect in TypeScript. Eliminating it cleans up the codebase and removes a point of potential confusion. For the METHOD_MAP, the solution involves de-hardcoding it. Instead of being a static, fixed map within the emitter, it should ideally be injected or dynamically referenced from norma (or a shared configuration source derived from norma). This ensures that the emitter always uses the most up-to-date and canonical method mappings, preventing any discrepancies or drift. By making METHOD_MAP dynamically sourced, we eliminate the risk of inconsistencies, reduce maintenance burden, and ensure that the emitter remains perfectly synchronized with the broader project's conventions. These seemingly small emitter tweaks contribute significantly to the long-term maintainability, correctness, and efficiency of Nanus, ensuring it continues to be a high-quality compiler without hidden gotchas. They are essential steps towards a more robust and future-proof compiler design.

What's Next? Acceptance Criteria and Future Vision

So, guys, we’ve covered a lot of ground here, detailing the crucial enhancements needed to stabilize and optimize our Nanus compiler. These aren't just wish-list items; they are concrete steps towards making Nanus the reliable, high-performance foundation that Rivus deserves. To ensure we're all aligned and know when these improvements are truly "done," let's lay out some clear acceptance criteria.

First off, regarding the generics issue, we absolutely need to add a minimal Nanus corpus test that includes generics. This test should fail before the fix is implemented, clearly demonstrating the current parser/lexer mismatch. Then, crucially, it must pass after the fix, confirming that our < and > tokenization and parsing are working as expected and unlocking generic type usage. This is a foundational test that verifies the core functionality for advanced type constructs.

Secondly, for our error handling, we need to ensure lexer errors abort compilation with a properly formatted message. This means no more silent failures or raw stderr dumps. Every lexing error should immediately stop the compilation process and present the developer with a clear, concise, and structured error message. This message must include the filename, line number, column, and a source pointer (like a caret ^ indicating the exact error location). This will drastically improve the debugging experience and uphold our "fail fast, fail clear" principle.

And finally, to address the inconsistent statement termination, we must ensure statement termination is deterministic even when newlines are removed. This means that whichever strategy we choose (explicit newlines or purely structural termination), the compiler's behavior must be predictable and consistent. There should be no ambiguity about where a statement ends, regardless of how whitespace is handled. This will eliminate a subtle source of bugs and make the Nanus grammar much easier to reason about and extend.

By meeting these acceptance criteria, we won't just be fixing bugs; we’ll be building a stronger, more reliable, and more user-friendly Nanus compiler. The value proposition here is immense: a stable, robust Nanus that’s ready to tackle the complexities of Rivus development without hidden pitfalls. This proactive engineering effort ensures a smoother development journey, reduces debugging time, and ultimately accelerates our progress towards a fully functional and high-quality Rivus ecosystem. It's about empowering our developers with tools that work as expected, every single time. Let’s get it done!