Skip to main content
Compiler Design

Breaking the Syntax Barrier: Building Sakshyar, Nepal's First Nepali Programming Language

The journey of creating Sakshyar, Nepal's first programming language in Nepali, from tokenization to code generation. Because programming shouldn't be limited by language barriers.

5 min read
compiler-design programming-language nepali tokenization parser code-generation

During my third year at my college days, I found myself preoccupied with a question that felt both radical and necessary: Why should the mastery of logic be gated by the mastery of English?

This project didn’t start in a vacuum. The initial spark came from a conversation with my good friend Nirav. He had the brilliant vision of localized syntax, and his excitement was infectious. It made me realize that while programming is universal, its entry point isn’t. This led to the creation of Sakshyar: a college project experiment, not a production-level compiler, but a proof-of-concept to explore whether Nepali syntax could lower the barrier to entry for programming.

The Philosophy of Sakshyar

I couldn’t help but wonder: when I watched juniors struggle, was it really about them lacking “computer brains,” or was something else at play? Maybe they weren’t struggling with logic itself, but with a double translation—first translating a real-world problem into logic, then that logic into English keywords like while, function, and return. I wasn’t certain, but the hypothesis felt worth testing.

The experiment was straightforward: build a minimal compiler that transpiled Nepali keywords to C, and see if it made a difference. In truth, we never got to properly verify whether localized syntax actually helped students learn faster or think more clearly. The project scope was limited, and we lacked the resources for a proper study. But what we did learn was invaluable—not about whether the hypothesis was right or wrong, but about how compilers work, how languages are structured, and how much complexity hides behind the syntax we take for granted.

The Grammar of the Soil

In this version of Sakshyar, we moved closer to the natural cadence of the Nepali language. We introduced strict typing and specific control structures that feel intuitive.

Core Keywords & Data Types

Concept Sakshyar Keyword Usage
Numbers/Decimals अंक अंक संख्या = ५
Strings अक्षर अक्षर नाम = "Sammelan"
Function परिभाषा Defining a block of logic
Return फिर्ता Exiting a function with a value
Output लेख Predefined function for standard output
Input ल्याउ Predefined function for standard input

Logic & Control Flow

Our syntax uses the गर (Do) keyword to close blocks, making it read like a set of instructions rather than abstract symbols.

Conditionals

यदि k == 5 भए
    // logic here
नत्र k == 6 भए
    // logic here
नत्र
    // fallback logic
गर

Loops

Sakshyar supports both “Until” and “While” logic, matching how we speak:

Until: K == 5 नभए सम्म (Until K is 5)

While: K < 5 भए सम्म (While K is less than 5)

Break: निस्क (Exit)

Continue: जाउ (Go)

Compiler Architecture: Under the Hood

Building a compiler from scratch is a humbling exercise in detail. You aren’t just writing code; you are building the machine that understands code.

  1. Lexical Analysis (The Tokenizer) The first challenge was Unicode. Nepali characters are complex; a single visible character like “फ” can sometimes be stored as multiple Unicode points. Our Lexer had to normalize these into a stream of tokens before the compiler could even begin to “read.”

  2. Syntax Analysis (Parsing) Once we had a stream of tokens, the next step was to understand their structure. We implemented a Recursive Descent Parser that reads tokens and builds a parse tree according to our language’s grammar. To formalize this grammar, we defined Sakshyar in BNF (Backus-Naur Form), which served as our blueprint—ensuring that every यदि eventually found its matching गर, and that expressions were properly nested and evaluated.

Here’s the complete grammar specification:

Rule Name Production Rule
Program Function* EOF
Function "परिभाषा" "(" Parameters? ")" IDENTIFIER Block "गर"
Parameters Type IDENTIFIER ("," Type IDENTIFIER)*
Type "अंक" | "अक्षर"
Statement IfStmt | WhileStmt | UntilStmt | ReturnStmt | PrintStmt | InputStmt | CallStmt | BreakStmt | ContinueStmt
Block Statement*
IfStmt "यदि" Expression "भए" Block ("नत्र" Expression "भए" Block)* ("नत्र" Block)? "गर"
WhileStmt Expression "भए सम्म" Block "गर"
UntilStmt Expression "नभए सम्म" Block "गर"
ReturnStmt Expression "फिर्ता"
PrintStmt (Expression ("," Expression)*)? IDENTIFIER? "लेख"
InputStmt IDENTIFIER "ल्याउ"
Break/Cont "निस्क" | "जाउ"
Expression Equality
Equality Comparison (("==" | "!=") Comparison)*
Comparison Term ((">" | ">=" | "<" | "<=") Term)*
Term Factor (("+" | "-") Factor)*
Factor Unary (("*" | "/") Unary)*
Unary ("!" | "-") Unary | Primary
Primary NUMBER | STRING | IDENTIFIER | "(" Expression ")" | CallStmt
CallStmt (Expression ("," Expression)*)? IDENTIFIER
  1. Code Generation (The Bridge to C) To ensure Sakshyar was fast and portable, I chose C as the target language. The Sakshyar compiler reads Nepali, builds an Abstract Syntax Tree (AST), and then “transpiles” it into optimized C code.

Implementation Example: The Addition Function

Here is how a standard function and call looks in Sakshyar:

Function Definition

// definition of addition
परिभाषा (अंक ka, अंक kha) जोड
    अंक ga = ka + kha
    ga फिर्ता
गर

Function Call & Output

// Call with parameters, ६ जोड

// Standard output
"नमस्ते!" लेख

Challenges and Solutions

Identifier Translation: C compilers don’t like Unicode identifiers. I implemented a transliteration system that converts Nepali variable names into valid ASCII equivalents behind the scenes.

Semantic Logic: Implementing नभए सम्म (Until) required a logical inversion in the compiler logic to map correctly to C’s while(!condition) structure.

Standard I/O: Making लेख (write) and ल्याउ (read) feel like keywords while they were actually wrappers around printf and scanf required careful handling of variadic arguments.

Impact & Reflection

Sakshyar was showcased at mnsa.cc and received positive feedback from the Nepali developer community, though it remained a proof-of-concept rather than a tool anyone would use in production.

Building this taught me that compiler design is 70% planning and 30% implementation. It demystified the “magic” of programming languages. Now, when I use a high-level language, I don’t just see syntax; I see the lexer, the parser, and the memory management happening underneath. The hypothesis about localized syntax helping students might remain unproven, but the journey of building Sakshyar was its own reward—a deep dive into the machinery that makes programming languages possible.

I am deeply grateful to Nirav for the initial spark. This project, while experimental, made me question assumptions about who programming languages are for and how they should be designed.

References