Skip to content

Lexical Structure

The lexical structure of EasyLang defines how your code is broken into tokens.
Tokens are the smallest meaningful pieces of a program — words, numbers, symbols, keywords, operators, etc.

This section explains how EasyLang reads your source code before parsing it into statements and expressions.


Overview

EasyLang’s lexer (tokenizer) splits code into:

  • Keywords
  • Identifiers
  • Numbers
  • Strings
  • Operators
  • Symbols (brackets, parentheses, commas…)
  • Comments
  • Whitespace (ignored except for newlines)

Everything here is based on the actual token rules used in the lexer inside the interpreter.


Comments

Comments are ignored by the interpreter.

Single-line Comments

$ this a comment

These last until the end of the line.

Multi-line Comments

$$
    This is a multi-line comment
$$

Useful for explanations or temporarily disabling blocks of code.


Keywords

Keywords are reserved words with special meaning.
You cannot use these as variable names.

we let
so
print 
read
true 
false 
not
equals 
not equals 
less
greater 
plus 
minus
mul 
div 
and
or 
if 
then
else else if 
repeat
while 
from 
to
do 
define 
return
bring 
as 
open
close 
writeline 
readline
for 
into 
with
continue 
break

These keywords map directly to tokens in the lexer.


Identifiers (Variable Names)

Identifiers refer to variable names, function names, module aliases, and dictionary keys.

Rules:

  • Must start with a letter (A–Z or a–z) or _
  • After that, can include letters, digits, and _

Examples:

x
name
user_age
total2
_value

Invalid Identifiers:

2x (starts with a number)
true (keyword)
if (keyword)
minus (keyword)

If a keyword is used as an identifier, the parser produces a friendly error.


Numbers

EasyLang supports integers and floating-point numbers.

Integer examples:

10
0
999

Float examples:

3.14
0.001
10.0

Numbers are tokenized automatically based on digits and optional decimal points.


Strings

Strings are always enclosed in double quotes:

"hello"
"EasyLang is cool!"
"123"
"line one\nline two"

Notes:

  • No single-quoted strings
  • No multi-line strings
  • The interpreter removes the quotes and gives you the raw text

Boolean Literals

EasyLang supports:

true
false

These map to Python True and False values at runtime.


Symbols & Operators

EasyLang supports two types of symbols:

1. Punctuation / Structural Symbols

Symbol Meaning
[ start block / list
] end block / list
{ dictionary start
} dictionary end
( function call start
) function call end
, argument separator
: block/function separator
. attribute or method access

Blocks use:

[
statements...
]

2. Operators

Operators can be written in English words OR in symbol form.

Arithmetic Operators

English form Symbol Meaning
plus + addition
minus - subtraction
mul * multiplication
div / division

Examples:

a plus b
x minus 5
y mul 3
value / 10

Comparison Operators

English form Symbol Meaning
equals == equality
not equals != inequality
less < less than
greater > greater than
<= less-or-equal
>= greater-or-equal

Examples:

if x equals 10 then [...]
if name not equals "John" then [...]

Logical Operators

Operator Meaning
and logical AND
or logical OR
not logical NOT

Example:

if is_admin and logged_in then [...]

Whitespace

Whitespace (spaces and tabs) is ignored by the lexer.

Newlines separate statements unless inside blocks.

For example:

we let x = 10
we let y = 20
so print x plus y

Inside block:

[
we let x = 10
we let y = 20
]

No indentation rules — only brackets matter.

Token Examples

Here is how the lexer would tokenize a small program:

we let x = 10
so print x plus 5

Produces tokens like:

WELET     "we let"
ID        "x"
ASSIGN    "="
NUMBER    10
SO        "so"
PRINT     "print"
ID        "x"
PLUS      "plus"
NUMBER    5

This matches the implementation in the lexer of the interpreter.


Summary

EasyLang's lexical structure is simple, consistent, and heavily English-inspired: - English keywords - Natural Operators - No punctuation-heavy syntax - Strings use quotes - Numbers are auto-detected - Blocks use [ and ] - Comments begin with $ or $$...$$


Next Steps

Continue to Grammer to learn how these tokens form full programs.