Lexical Structure¶
The lexical structure of EasyLang defines how your code is broken into tokens.
Tokens are the smallest meaningful pieces of a program — words, numbers, symbols, keywords, operators, etc.
This section explains how EasyLang reads your source code before parsing it into statements and expressions.
Overview¶
EasyLang’s lexer (tokenizer) splits code into:
- Keywords
- Identifiers
- Numbers
- Strings
- Operators
- Symbols (brackets, parentheses, commas…)
- Comments
- Whitespace (ignored except for newlines)
Everything here is based on the actual token rules used in the lexer inside the interpreter.
Comments¶
Comments are ignored by the interpreter.
Single-line Comments¶
$ this a comment
These last until the end of the line.
Multi-line Comments¶
$$
This is a multi-line comment
$$
Useful for explanations or temporarily disabling blocks of code.
Keywords¶
Keywords are reserved words with special meaning.
You cannot use these as variable names.
we let
so
print
read
true
false
not
equals
not equals
less
greater
plus
minus
mul
div
and
or
if
then
else else if
repeat
while
from
to
do
define
return
bring
as
open
close
writeline
readline
for
into
with
continue
break
These keywords map directly to tokens in the lexer.
Identifiers (Variable Names)¶
Identifiers refer to variable names, function names, module aliases, and dictionary keys.
Rules:¶
- Must start with a letter (A–Z or a–z) or
_ - After that, can include letters, digits, and
_
Examples:
x
name
user_age
total2
_value
Invalid Identifiers:
2x (starts with a number)
true (keyword)
if (keyword)
minus (keyword)
If a keyword is used as an identifier, the parser produces a friendly error.
Numbers¶
EasyLang supports integers and floating-point numbers.
Integer examples:¶
10
0
999
Float examples:¶
3.14
0.001
10.0
Numbers are tokenized automatically based on digits and optional decimal points.
Strings¶
Strings are always enclosed in double quotes:
"hello"
"EasyLang is cool!"
"123"
"line one\nline two"
Notes:¶
- No single-quoted strings
- No multi-line strings
- The interpreter removes the quotes and gives you the raw text
Boolean Literals¶
EasyLang supports:
true
false
These map to Python True and False values at runtime.
Symbols & Operators¶
EasyLang supports two types of symbols:
1. Punctuation / Structural Symbols¶
| Symbol | Meaning |
|---|---|
[ |
start block / list |
] |
end block / list |
{ |
dictionary start |
} |
dictionary end |
( |
function call start |
) |
function call end |
, |
argument separator |
: |
block/function separator |
. |
attribute or method access |
Blocks use:
[
statements...
]
2. Operators¶
Operators can be written in English words OR in symbol form.
Arithmetic Operators¶
| English form | Symbol | Meaning |
|---|---|---|
plus |
+ |
addition |
minus |
- |
subtraction |
mul |
* |
multiplication |
div |
/ |
division |
Examples:
a plus b
x minus 5
y mul 3
value / 10
Comparison Operators¶
| English form | Symbol | Meaning |
|---|---|---|
equals |
== |
equality |
not equals |
!= |
inequality |
less |
< |
less than |
greater |
> |
greater than |
| — | <= |
less-or-equal |
| — | >= |
greater-or-equal |
Examples:
if x equals 10 then [...]
if name not equals "John" then [...]
Logical Operators¶
| Operator | Meaning |
|---|---|
and |
logical AND |
or |
logical OR |
not |
logical NOT |
Example:
if is_admin and logged_in then [...]
Whitespace¶
Whitespace (spaces and tabs) is ignored by the lexer.
Newlines separate statements unless inside blocks.
For example:
we let x = 10
we let y = 20
so print x plus y
Inside block:
[
we let x = 10
we let y = 20
]
No indentation rules — only brackets matter.
Token Examples¶
Here is how the lexer would tokenize a small program:
we let x = 10
so print x plus 5
Produces tokens like:
WELET "we let"
ID "x"
ASSIGN "="
NUMBER 10
SO "so"
PRINT "print"
ID "x"
PLUS "plus"
NUMBER 5
This matches the implementation in the lexer of the interpreter.
Summary¶
EasyLang's lexical structure is simple, consistent, and heavily English-inspired:
- English keywords
- Natural Operators
- No punctuation-heavy syntax
- Strings use quotes
- Numbers are auto-detected
- Blocks use [ and ]
- Comments begin with $ or $$...$$
Next Steps¶
Continue to Grammer to learn how these tokens form full programs.