FANDOM




The Ross Dictionary

Copyright July 2010, Edwin E. Ross


Transient, Persistent and Perennial Storage for
Objects, Programs and Systems.

Preface

The design within has not been rendered to code; therefore, it is a work in progress. As such, this document may have errors and omissions that will be corrected later.

Any and all technology improvements mentioned in this document are donated to the world; there are no patents, no registered trademark, nor any secret technology.


Dedicated to my wife, without whom this document would not be possible.



Introduction

This Document describes a code/object base, its requirements, structure, properties, methods and uses. It stores code and objects for any duration, whether transient, persistent or perennial. Its methods create, modify and destroy code and objects. For example, objects may include source code, byte code, and machine code. Objects also include, but are not limited to, the following: each character in a character set, whitespace, comments, numbers, strings, programs and systems. In this context, an object is any software that is processed by a computer. In this document, the term code/object base is shortened to object base

This object base is called the Ross Dictionary, or the Dictionary, because it contains alphabetized symbols with definitions. An object is stored in a Dictionary Entry, and consists of a required unique Symbol and an optional Definition. A Definition may provide several definitions of a Symbol, in other words, a Definition may be a list of definitions.

A search method can find a Symbol among Entries in the Dictionary and return an integer subscript of the Entry; that subscript is called a Token. Both Symbol and Token identify a specific Entry; consequently, they are interchangeable aliases.

This Dictionary has been designed for a metaprogramming language named Ameba, which is mentioned a few times in this document. However, another language may be developed to use the Dictionary. Ameba is interactive as a calculator, and has an infix calculator-like, extensible syntax and an extensible user interface. An enhanced calculator-like user interface provides an Integrated Development Environment (IDE) that combines an editor, translator, debugger, and regression test manager. But, this document is about the Dictionary, not Ameba or its IDE.

Requirements

The requirements of this Dictionary are listed below.

  • It shall be an enhanced symbol table for a language translator.

  • It shall be stored in a memory image file.

  • The image file shall give access to all or parts of a Dictionary.

  • It must be a combination code base and object database.

  • It shall be runtime memory for a language translator.

  • It shall give access to an Entry directly via a numeric value called a Token.

  • It shall give access to an Entry by alphabetical search.

  • It will facilitate meta-processing of Entries, Symbols and Definitions.

  • It must facilitate translating source code into byte code.

  • It must facilitate translating byte code into source code.

  • It shall facilitate system reports and graphs.

  • It must have one copy of expressions, including loops and conditionals.

  • It shall store whole systems for an enterprise.

  • It will archive project information from planning to postmortem.

  • It shall provide version control, when needed.

Structure

The Dictionary is structured both physically and logically. Physical layout is controlled by C struct statements that organize data in memory. These C struct statements are straightforward and are fully documented below. Logical layout is established as syntax analysis processes programs and builds a phrase tree. This document does not and should not specify syntax rules, because such rules specify or partly specify a computer language and not an object base such as the Dictionary.

In this document, a lexeme item is identified by lexical analysis and expressions are identified by syntactic analysis. An expression can be small or large. Smaller expressions include calculations, conditions, loops, and calls. Larger expressions include blocks, and methods. The largest expressions include programs and systems. The Dictionary design provides for either 32-bit or 64-bit addresses, which means the Dictionary can be very very large, large enough to contain all developed and purchased software for an enterprise.

The Dictionary is designed to be an enhanced symbol table and general object base for a metalanguage interpreter. The interpreter initializes the Dictionary with built-in data and methods, including the ones required in this document and others required by the interpreter. Lexical analysis scans characters to make lexemes which include names, keywords, literals, whitespace, punctuation, comments, operators, expressions and programs. Syntactic analysis scans lexemes to make expressions that include both subordinate and superordinate expressions. Subordinate expressions include both parenthetical expressions and method calls, and superordinate expressions include expressions and methods that use an expression, either by call or inclusion.

Each Entry in the Dictionary contains Symbol and Definition pointers, with Symbol and Definition in the Heap, as shown below.

EntrySymDefHeap

Illustration 1: Entry, Symbol, Definition and Heap

Symbols

A Symbol is a string of one or more Tokens. Since character Entries are built-in; whereby, a character code equals the Token value for its Entry, character strings are also Token strings. A lexeme is a character string (i.e., a Token string) that is either punctuation, an operator, a name, a keyword, a literal, a comment, whitespace, or a nonsense string, Finally, a compound Token string contains at least one Token that is not a character.

Parsing, herein, includes both lexical and syntactic processing. Syntactic analysis builds Symbols from compound Token strings while parsing expressions that have subexpressions and method calls. Consequently a phrase tree is entwined within the Symbols. Moreover, a phrase tree may be expanded to produce source code, by substituting Symbols for Tokens until the expansion contains only characters. Each Symbol is unique; thus equal expressions occur only once in a Dictionary, with references to that Entry from wherever the expression is used. This means editing an expression affects every reference to it in a Dictionary, a feature that is both very powerful and potentially dangerous. Fortunately, application programs do not change symbols; only meta programs do.

Three processes will make symbols. First, language initialization will make built-in entries. Second, lexical analysis of code makes Symbols that are character strings. Finally, syntax analysis makes Symbols that are compound Token strings.

Structs

The Dictionary, coded in C, is composed of a heap and several structs. The heap is named Heap and the structs are named Main, EntryArray, Entry and Sorted. The Heap, Main, EntryArray and Sorted, shall be allocated memory during runtime. The EntryArray contains Entries with a size and a Heap address for the Symbol and a size and reference for the Definition. Every Entry contains a Symbol in the Heap. The Definition may be empty, may reference an alias Entry, or may reference the Heap.

Each Entry in the Dictionary contains Symbol and Definition pointers, with Symbol and Definition in the Heap, as shown below. <P CLASS="double-space-indent-western" STYLE="text-decoration: none">If the Definition is empty, both size and reference are zero. If the Definition is an alias, the size is zero and the reference is a Token to an alias Entry. If the Definition is in the Heap, the size is not zero and the reference is a Heap address.

The Dictionary contains Entries with Symbols and Definitions, which a program can create, destroy, read and write. Entries, Symbols and Definitions may be either persistent or transient, because a Dictionary is both archival and main memory for whatever metaprogram that uses it.

Each Dictionary Entry identifies a unique Symbol and a Definition; they are stored in the Heap. A Definition may be whatever is needed, for example, a list having several Symbol attributes, including a number of overloads. The type of a Definition partly depends on its size. When a Definition size equals one, the Definition type is a character. Otherwise, the Definition type is determined by its first few bytes as described later in this document.

The Heap is divided into variable length fragments, and each fragment contains either a Symbol or Definition. The size and address of each fragment is stored in one and only one Entry, which simplifies garbage collection. Symbols that are Token strings are always terminated with a zero or NUL Token, and may be shorter than the fragment size. Definition data types are not specified in this document, because only the application that creates a Dictionary specifies these types.

Sorted is an array of Tokens that are alphabetically sorted by Symbol. Each Token identifies an Entry, and the Entry has a pointer to a Symbol. Thus, a program may sort the Tokens in Sorted, and subsequently may search for a Symbol using Sorted. Searching yields a Token.

A Dictionary is composed of one small memory allocation and three large memory allocations. The small allocation is called Main, and, among other things, it contains the address, maximum size, and used size for each of the three large memory allocations. The large allocations are called EntryArray, Heap, and Sorted. A dictionary is saved to a file by writing the contents of these four allocations.

Properties

Properties are data items within object classes. However, using C++ struct classes incurs unwanted overhead bytes. Therefore, the structs Main, EntryArray, Sorted, Heap and Entry are C structs, without the extra overhead. Although the structs described herein are not C++ classes, they are, nonetheless objects for languages using the Dictionary. And, they are described as classes.

Main is a fixed size struct that contains properties required to manage Tokens and variable sized memory allocations of EntryArray, Heap, and Sorted. These properties are the address, used size, and maximum size for each EntryArray, Heap, and Sorted. Main also has properties for managing Tokens,which do not require a memory allocation. The Token properties establish the current range of Token values.

EntryArray is a struct that contains the struct Entries and MaxEntryTokens, which is the number of Entries in EntryArray and is the maximum number of Tokens. Tokens are positive integer values.

Entry is a struct in EntryArray that contains the size and Heap address of a Symbol and the size and location of a Definition. If the Definition size is zero, the location is a Token that identifies an alias Entry. If the Definition size is not zero, the location is a Heap address.

Heap is an array of bytes. It is divided into variable length fragments that contain Symbols and Definitions.

Sorted is an array of Tokens. It varies in size as the number of Entries varies, thus the number of Tokens in Sorted equals MaxEntryTokens.

Methods

Methods are code used to manage properties. Private methods are those used only by the language that uses the Dictionary, and public methods are those used by people programming in the language.

Public methods shall create, modify, and destroy Entries. Private methods shall create, modify, and destroy the Heap, and Sorted. Public methods shall read, write and modify Symbols and Definitions. Private methods shall increase and decrease the range of Token values corresponding with the number of Entries in the EntryArray. The public methods that create and destroy Entries call the private Token methods. Methods to create, modify, and destroy Symbols and Definitions call private Heap methods.

Uses

The Dictionary was designed for Ameba, which is a metaprogramming language that has an extensible syntax. In other words, the main use for the Dictionary is for language processing software, including translators and interpreters. A Dictionary and its methods facilitate using a Dictionary as both a file and memory.

Design

This paragraph, Design, and subparagraphs reiterate information already given, and additional details and comments expound and clarify features of the Dictionary. These Design details are in subordinate paragraphs titled as follows: Tokens, Entries, Symbols, Definitions, EntryArray, Heap, Sorted, and Main.

The C struct statements are herein described as objects with properties and methods. The interconnections among the structs are explained. Methods are identified by name and function, and are characterized as public or private. Finally, memory allocation is discussed.

A Dictionary has many Entries, which consist of a Symbol and Definitions.

A Token is an integer that identifies an Entry.

Symbols and Definitions are variable length and are stored in the Heap.

A Dictionary is organized in several ways, physically as struct statements, randomly by Token, alphabetically by Symbol, and logically as phrase trees. The physical organization is a framework for managing information in the Dictionary, and the other three facilitate access to the information.

Dictionary

A Dictionary is composed of one small and three large memory allocations. The small allocation is called Main, and, among other things, contains the address, maximum size, and used size of the other three large memory allocations. The structures requiring large allocations are called EntryArray, Heap, and Sorted. A dictionary can be saved to a file by writing the contents of these five allocations. Each Dictionary has its own file.

Tokens

A Token is a subscript of an Entry within the EntryArray. Token fields within C structs are long integers. Thus, a Token may be either 32 or 64 bits (that is, 4-bytes or 8-bytes) depending on computer architecture. The initial Token size in every Token string is 1-byte. However, Token size and sign can change within a Token string, because a few Tokens are reserved for that purpose. Token sizes can be 1, 2, 4 or 8 bytes. Token fields are unsigned.

typedef unsigned long t_Token;

C Code 1: Token

To minimize memory smaller Tokens should be favored. Assume ISO/IEC 646 is used in a Dictionary, because it is commonly used for source code. ISO/IEC 646 character strings consume 7-bits of a byte per character, because there are only one hundred twenty-eight characters. However, a byte can contain values from 0 to 256, which means another 128 values can be used for other Tokens. Using ISO/IEC 646 helps conserve memory compared to larger character sets. If a sixteen bit character code is used, then Tokens for built-ins must be thirty-two bits, which makes a Dictionary larger.

Token values, from 128 to 255 or more, shall be reserved for built-in Entries required by an interpreter or other metaprogram. Token values for many Entries will be too large for a 1-byte field; thus, larger token sizes must be used.

In this day and age, memory conservation may seem archaic due to multi-gigabyte memory sizes. However, a Dictionary is designed to contain all the systems owned by an enterprise, which might be a multi-terabyte object base. For this reason, frugal memory use is warranted.

The default Token string size is 1-byte unsigned, and the first Token in such a string must be 1-byte. But, all Token sizes may occur in a single Token string. Several built-ins switch from 1-byte to 2, 4 and 8 byte Tokens, and back. This scenario means that multi-byte tokens must be stored big-endian (i.e., most significant byte first). Otherwise, some Token processing algorithms will be overly complex.

A Dictionary must be initialized with an Entry for each character in whatever character set is used. Moreover, the Token for each character Entry must equal its Token. These conditions are required for proper encoding and decoding. Lexical and syntactic analysis translates a source code character string into Token strings, and the reverse is possible. Token strings in a Dictionary translate into character strings. Character strings are source code, and Token strings are byte-code. Tokens that identify Entries are either code, data, or both.

Symbols

A Symbol is a string of one or more Tokens. Since character Entries are built-in; whereby, a character code equals the Token value for its Entry, character strings are also Token strings. A lexeme is a character string (i.e., a Token string) that is either punctuation, an operator, a name, a keyword, a literal, a comment, whitespace, or a nonsense string, Finally, a compound Token string contains at least one Token that is not a character, see Table 1, Example Symbols.

While each character is a Symbol, they may not be the names of code or data, such as a variable or method name. Consequently, the two may not be spelled the same. In this document, characters shall be identified by a single quote proceeding them, for example 'A; whereas, a Symbol that is a name, such as X, is not proceeded by a single quote, refer to Table 1, Example Symbols, which is similar to a sparsely populated Entry array. This array is a partial text example of a Dictionary with no Definitions and with annotations "Pretty Print". Table 1: Ecample Symbols shows symbols necessary for the expression (A+B)**2+F(11).

Table 1, Example Symbols

Token

Symbol in Heap as Token string

Pretty Print

40

40

'(

41

41

')

42

42

'*

43

43

'+

49

49

'1

50

50

'2

65

65

'A

66

66

'B

70

70

'F

140

40 0

(

141

41 0

)

142

42 0

*

143

43 0

+

149

49 0

1

150

50 0

2

165

42 42 0

**

300

65 0

A

327

66 0

B

362

70 0

F

502

140 300 143 327 141 0

(A+B)

518

49 49 0

11

672

362 140 518 141 0

F(11)

707

502 165 150 0

(A+B)**2

785

707 143 672 0

(A+B)**2+F(11)



Parsing, herein, includes both lexical and syntactic processing. Lexical analysis translates characters into lexemes. The character set Entries, (0 through 127) have built-in definitions that do lexical processing; whereas, Entries above 127 have definitions that do syntactic analysis and expression evaluation. Syntactic analysis builds Symbols with compound Token strings while parsing expressions that have subexpressions and method calls.

After parsing, Symbols contain a phrase tree, see "Table 1, Example Symbols" and "Illustration 2, Example Phrase tree". The root of this example tree, Token 785, is the last symbol in Table 1.

ParseTree1

Illustration 2: Example Phrase tree

Nodes of this phrase tree contain a bold integer, for example the tree of Token 785 and leaves contain pretty printed lexemes. Definitions of Symbols may have an attribute list with additional data for phrase trees, for example expressions in prefix form. Parse tree leaves are lexemes; though, character level detail exists in the Dictionary, because character definitions contain lexical analysis programs.

Before byte code is optimized, phrase trees may be expanded into source code identical to the source that was parsed; afterward, expanded source is optimized. Expansion is a simple process, recursively substitute Symbols for non character Tokens in a Symbol being expanded. The expansion process terminates when the Symbol being expanded contains only characters. A meta process may create a concrete syntax tree from a phrase tree. Typically a concrete syntax tree will not expand to produce source identical to the parsed source, but it may expand into an equivalent program. Although, some meta processes, for example partial evaluation, will produce concrete syntax trees that are significantly different than parsed source.

Each Symbol is unique; thus equal expressions occur only once in a Dictionary. This means editing an expression affects every reference to it in a Dictionary, a feature that is both very powerful and potentially dangerous. Fortunately, application programs do not change symbols; only meta programs do.

Three processes will make symbols. First, language initialization will make built-in entries, including 128 character entries and built-ins required by the interpreter. Second, lexical analysis of code makes Symbols that are character strings. Finally, syntax analysis makes Symbols that are compound Token strings.

Structs

The Dictionary, coded in C, is composed of a heap and several structs. The heap is named Heap and the structs are named Main, EntryArray, Entry and Sorted. The Heap, Main, EntryArray and Sorted, shall be allocated memory during runtime. The EntryArray contains Entries with a size and a Heap address for the Symbol and a size and reference for the Definition, wherein a reference may be either a Token or an address. Entries contain a Symbol in the Heap, except for Entries that are not currently used. The Definition may be empty, may reference an alias Entry by Token, or may reference the Heap by address.

EntrySymDefHeap

Illustration 3: Entry, Symbol, Definition and Heap

A Dictionary is composed of one small memory allocation and three large memory allocations. The small allocation is called Main, and, among other things, it contains the address, maximum size, and used size for each of the three large memory allocations. The large allocations are called EntryArray, Heap, and Sorted. A dictionary is saved to a file by writing the contents of these four allocations.

Properties are data items within object classes. However, using C++ struct classes incur unwanted overhead. Therefore, the structs Main, EntryArray, Sorted, Heap and Entry are C structs, without the extra overhead. Although the structs described herein are not C++ classes, they are, nonetheless objects for languages using the Dictionary. And, they are described as classes.

Methods are code used to manage properties. Private methods are those used only by the language that uses the Dictionary, and public methods are those used by people programming in the language.

Entry

Each Dictionary Entry contains a unique Symbol and a Definition; although, a Definition may contain several definitions for a Symbol. The type of a Definition first depends on its size. When a Definition size equals one, the Definition type is a character. Otherwise, the Definition type is determined by its first few bytes as described later in this document.

An Ameba Dictionary is composed of Symbols and Definitions, much like a dictionary for a spoken language, except it is designed for programming. Consequently, it may contain binary data. A Symbol is either a character, a lexeme or an expression. A Definition may be a lexeme, expression or other code and data, including, but not limited to, file names, links to other Dictionaries, and links to servers, such as a database server.

Each Entry within EntryArray contains a unique Symbol and optionally a Definition. Programs may search for a Symbol by using an array that contains Tokens. As stated before, a Token is a subscript within the EntryArray that identifies a unique Entry. The Heap, a large memory allocation, is divided into variable length fragments, which are either Symbols or Definitions. Each struct, named Entry, contains an address, a maximum size, and bytes used for a fragment in the Heap.

typedef unsigned char uchr;
struct s_Entry {

 long  SymSize;
 uchr *SymAddr;
 long  DefSize;
 union Def {
   uchr   *Addr;
   t_Token Token;
   };
 };

typedef struct s_Entry t_Entry;

C Code 2: Entry
Entry Properties

Entry is a struct in EntryArray that contains SymSize, SymAddr, DefSize and either Def.Addr or Def.Token.

SymSize is the size of a Symbol. If SymSize equals one, the Symbol is one of the character set. Otherwise, the Symbol is a Token string. If SymSize equals zero, the Entry is not currently used.

SymAddr is the address of a Symbol in the Heap.

DefSize has different meanings, depending on whether its value is zero or not. Whenever DefSize is not zero, Def.Addr is used instead of Def.Token, and conversely, whenever DefSize is zero Def.Token is used.

Def.Addr and its alias Def.Token identify either a Heap address or an Entry as the Definition of the current Entry. Whenever both DefSize and Def.Token are zero, the Definition of the current Entry is empty. Whenever DefSize is zero but Def.Token is not zero, the Definition iof the current Entry is a Token of an alias Entry. Whenever both DefSize and Def.Addr are not zero, the current Definition is in the Heap.

Symbol

Each symbol is stored in a fragment of the Heap as a 1-byte Token string, in which some 2, 4 and 8-byte Tokens may occur. A few built-ins are used similarly to forcing characters in character strings. These built-ins are 1-byte each, and switch Token size from 1-byte to 2, 4 or 8-bytes. Three built-ins switch the very next Token to either a 2, 4 or 8-byte Token. There are seven built-ins that switch the size of Tokens, three built-ins switch to a larger Token size for one Token. Another three built-ins switch to a larger substring of Tokens. Finally one built-in terminates a larger substring of Tokens. These seven token values and their meanings follow:

  • 128 next Token is 2-bytes,

  • 129 next Token is 4-bytes,

  • 130 next Token is 4-bytes,

  • 131 next Token substring contains 2-byte Tokens,

  • 132 next Token substring contains 4-byte Tokens,

  • 133 next Token substring contains 8-byte Tokens,

  • 0 substring terminator.

The simplest kind of Symbol is a character. In fact each character in the set of characters being used by a translator shall have an Entry in the Dictionary. This requirement allows Tokens to be translated into their print names. Additionally, for Ameba the character Entries are among the built-ins that lexical analysis uses to create lexemes, such as whitespace, comments, literals and symbols.

The next simplest kind of Symbol is a lexeme, which is a string of characters, for example, "Temp," "search," and “The Statue of Liberty.” Typically compilers and interpreters discard some lexemes during lexical analysis. However, lexical analysis for a metalanguage such as Ameba or English should not discard any lexemes. Consequently, the Dictionary places no limits on lexemes or, for that matter, on Symbols.

The most complex kinds of Symbol are expressions, including calculations, conditions, loops, calls, subexpressions, and expressions. The simplest expressions, such as A+B, occur in larger expressions, for example, (A+B)+C. A method is a collection of expressions. An object is a collection of data and methods. A program is a collection of objects. And, a system is a collection of programs. All these are complex expressions that may be stored as Symbols in a Dictionary.

Definition

Definitions are stored in parts of the Heap called fragments, which may be either code or data. Note, however, that both source and byte code are stored in Symbols; thus, code stored in a Definition is likely to be another form of code, such as machine-code. Moreover, source code stored in Symbols will contain source data; thus, data stored in a Definition is likely to be another form of data, such as binary data. Symbols used as variable names shall have Definitions with various uses, including variable data stores and machine code methods.

Dictionary based requirements of Definitions are Spartan. The form and content of Definitions is mostly reserved for a meta interpreter that uses the Dictionary. The two bytes of heap memory used for a Definition is reserved for type data,

Entry Methods

The Entry methods are the following: createEntry, destroyEntry, getSymSize, putSymSize, getDefSize, putDefSize, getDefToken, putDefToken, getDefAddr, putDefAddr. createSymbol, destroySymbol, createDefinition and destroyDefinition.

EntryArray

A Dictionary has an array of Entries, in which subscripts, called Tokens, identify specific Entries. MaxEntryTokens is the maximum number of Tokens allowed. Each Token identifies an Entry that may be either built-in or user-defined. Among the built-ins are the characters, which are defined as Entries 0-127, corresponding to the ISO/IEC 646 character codes

struct s_EntryArray {

 long    MaxEntryTokens;
 t_Entry Entries[];
 };

typedef struct s_Entry t_Entry;

C Code 3: EntryArray
EntryArray Properties

EntryArray is a struct that contains MaxEntryTokens and the struct Entries.

MaxEntryTokens is the number of Entries in EntryArray; thus, it is the maximum number of Tokens that will exist.

Entries is an array with each element being an Entry.

EntryArray Methods

EntryArray calls include collectEntryArrayGarbage, getMaxEntryTokens and setMaxEntryTokens.

The collectEntryArrayGarbage method may move an Entry from one position in EntryArray to another. However, all occurrences of the Token for a moved Entry must be updated with the new Token value. Consequently, this kind of garbage collection is slow and should be avoided for as long as possible. Unused Entries may be reused to avoid moving Entries.

Heap

The heap is memory for Symbols and Definitions, and must be allocated during runtime. Entries contain data needed to manage memory for Symbols and Definitions that are allocated in the Heap.

Heap Properties

There are none.

Heap Methods

The Heap methods are the following: createHeap, destroyHeap, reallocateHeap, createFragment, destroyFragment, reallocateFragment, getFragment, and setFragment.

Sorted

Sorted is an array of Tokens that are alphabetically sorted by Symbol. Each Token identifies an Entry, and the Entry has a pointer to a Symbol. The Symbol part of an Entry contains a pointer into the Heap and the size of memory allocated for the Symbol. A program may search for a Symbol using Sorted. The search result is a Token.

struct s_Sorted {

 t_Token Tokens[];
 };
C Code 4: Sorted
Sorted Properties

There are none.

Sorted Methods

Sorted methods include the following: createSorted, destroySorted, reallocateSorted, putToken, removeToken, sortSorted, and searchSorted.

Main

The struct Main contains data to manage all the Dictionary structures, and to assure they can be quickly written to a file as an image of memory. There are four subordinate buffers, each with an address, size maximum, and size used. Additionally, it will contain miscellaneous data for the Ameba interpreter, but this data is not discussed here and is excluded from the struct s_Main, shown in C Code 2: Main.

Main Properties

Main is a fixed size struct that contains properties required to manage Tokens and variable sized memory allocations of EntryArray, Heap, and Sorted. These properties are the address, used size, and maximum size for each EntryArray, Heap, and Sorted. Main also has properties for managing Tokens, which do not require an allocation. The Token properties establish the current range of Token values.

HeapAdr is the address of a large string of characters called the Heap.

HeapSizeMax is the size of memory allocated to the Heap.

HeapSizeUsed is the amount of memory used. It is divided into many fragments that are used for Symbols and Definitions.

EntryArrayAdr is the address of the EntryArray.

EntryArraySizeMax is the total size of the EntryArray structure.

EntryArraySizeUsed is the number of bytes used in the EntryArray.

SortedAdr is the address of the array Sorted . Each element of the array contains a Token, and the Tokens are sorted alphabetically ascending by Symbol. Thus, the first Token in the array identifies the Entry whose symbol is least alphabetically, and the last Token in the array identifies the Entry whose symbol is the greatest alphabetically.

MaxSizeSorted is the number of bytes in the Sort array.

UsedSizeSorted is the number of bytes in the Sort array currently being used.

Main Methods

Main methods are the following: createDictionary, destroyDictionary, getHeapAdr, setHeapAdr, getHeapSizeMax, setHeapSizeMax, getHeapSizeUsed, setHeapSizeUsed, getEntryArrayAdr, setEntryArrayAdr, getEntryArraySizeMax, setEntryArraySizeMax, getEntryArraySizeUsed, setEntryArraySizeUsed, getSortedAdr, setSortedAdr, getMaxSizeSorted, setMaxSizeSorted, getUsedSizeSorted, and setUsedSizeSorted.

Files

A Dictionary should be saved to its image file often. Small Dictionaries may be entirely loaded into memory. However, a very large Dictionary can be larger than memory, in which case demand swapping between file and memory is required, a process that in some cases can degrade performance. To assure high performance interpreter memory needs to be large enough to contain the most frequently used objects.

Export/Import

Exporting part of a Dictionary will create a new and smaller Dictionary file. While exporting, Token values in the new Dictionary will often have different values than the original Tokens.

Importing a Dictionary into another Dictionary merges the two into one and may change Token values of the imported Entries.

Summary

A Dictionary is both a symbol table and object base, designed to be used by a metalanguage. The Dictionary is an array of Entries that contains Symbols and Definitions. Definitions will be defined by a language that uses the Dictionary, except for a few built-ins that are specified in this document. Symbols are characters, lexemes and expressions, stored as Token strings and Tokens that are also byte codes. Phrase trees, created by parsing, are embedded within the many Symbols in a Dictionary.

Token strings are byte code that can be translated into source code for examining, editing, and printing. The byte code and phrase tree are for meta processing, including optimizing code, making graphical reports, compiling to machine code, managing code versions, etc.

The structure of a Dictionary is described in two modes, one with C struct statements and the other with compound Token string Symbols. The C structs are Main, EntryArray, Entry, Heap and Sorted. Complex Token string Symbols are Token strings that may be 1, 2, 4 or 8 byte values. The simplest symbols are characters, e.g., ISO/IEC 646. Less simple Symbols are ISO/IEC 646 strings that are lexical items, such as operators, comments, whitespace and alphanumeric names.

Conclusion

The Dictionary is basically a symbol table and phrase tree for a metalanguage. In particular, the Dictionary was designed for a new metalanguage called Ameba, which is an extensible calculator with its own Integrated Development Environment (IDE). Both the Ameba language and the IDE are extensible. In fact, much of the IDE client will be coded in Ameba Script that runs in a browser. Ameba Script is for user interface projects, such as the IDE. Ameba runs as a server and is for server projects, such as version control. An Ameba server stores code for either Ameba or Ameba Script, since Ameba Script cannot do file I/O.

An Entry for a Dictionary can link one Dictionary to another; thereby, a Dictionary can maintain a list of other Dictionaries used by an enterprise, an application, version, test, etc. Each Dictionary is stored in its own file. A Dictionary Definition contains a type and a file name. A Dictionary compiled for a 64-bit machine can address many trillions of objects. Multiple Dictionary servers with clients allows for an almost unlimited number of simultaneous users, who can be technical, administrative, and managerial staff.

The Dictionary and metalanguage Ameba will be an information systems tool with some unique capabilities. For example, metaprograms that process source can help people as application programs help businesses; metaprograms that process natural language will do things we merely imagine today. Ameba will come with a rapid prototyping tool that uses a calculator, spreadsheet like cells, and a screen layout tool as described in the next paragraph.

The calculator operates interactively, like all calculators. It also operates as a single cell spreadsheet that records a formula in an editor line and displays the value in the cell/accumulator. We can give that cell/accumulator a name. We can make a copy of that cell that may be dragged, dropped and sized anywhere on the screen. We can make many cells and organize them however we want, for example as rows and columns like a spreadsheet, as an entry/retrieval screen, as a menu, etc. When ready we can save that layout as a worksheet and give it a name. Thus, we can prototype a program. Since this prototype will be stored in a Dictionary, it may be thoroughly tested, modified by programmers, and controlled by version.

Ameba and the Dictionary are synergistic. They combine to make a programming tool useful to almost anyone. People can use it as a calculator or spreadsheet. Businesses can use it for applications. Power programmers can use it for utilities. Meta programmers can use it for tools. And, scientists can use it for artificial intelligence.

Ad blocker interference detected!


Wikia is a free-to-use site that makes money from advertising. We have a modified experience for viewers using ad blockers

Wikia is not accessible if you’ve made further modifications. Remove the custom ad blocker rule(s) and the page will load as expected.