SourceGen: Code Generation & Assembly

Back to index

SourceGen can generate an assembly source file that, when fed into the target assembler, will recreate the original data file exactly. Every assembler is different, so support must be added to SourceGen for each.

The generation / assembly dialog can be opened with File > Generate Assembly.

If you want to show code to others, perhaps by adding a page to your web site, you can "export" the formatted code as text or HTML. This is explained in more detail below.

Generating Source Code

Cross assemblers tend to generate additional files, either compiler intermediaries ("file.o") or metadata ("_FileInformation.txt"). Some generators may produce multiple source files, perhaps a link script or symbol definition header to go with the assembly source. To avoid spreading files across the filesystem, SourceGen does all of its work in the same directory where the project lives. Before you can generate code, you have to have assigned your project a directory. This is why you can't assemble a project until you've saved it for the first time.

The Generate and Assemble dialog has a drop-down list near the top that lets you pick which assembler to target. The name of the assembler will be shown with the detected version number. If the assembler executable isn't configured, "[latest version]" will be shown instead of a version number.

The Settings button will take you directly to the assembler configuration tab in the application settings dialog.

Hit the Generate button to generate the source code into a file on disk. The file will use the project name, with the .dis65 extension replaced by _<assembler>.S.

The first 64KiB of each generated file will be shown in the preview window. If multiple files were generated, you can use the "preview file" drop-down to select between them. Line numbers are prepended to each line to make it easier to track down errors.

Label Localizer

The label localizer is an optional feature that automatically converts some labels to an assembler-specific less-than-global label format. Local labels may be reusable (e.g. using "@LOOP" for multiple consecutive loops is easier to understand than giving each one a unique label) or reduce the size of a generated link table. There are usually restrictions on local labels, e.g. references to them may not be allowed to cross a global label definition, which the localizer factors in automatically.

Reserved Label Names

Some label names aren't allowed. For example, 64tass reserves the use of labels that begin with two underscores. Most assemblers will also prevent you from using opcode mnemonics as labels (which means you can't assemble the infinite loop jmp jmp jmp).

If a label doesn't appear to be legal, the generated code will use a suitable replacement (e.g. jmp_1 jmp jmp_1).

Platform-Specific Features

SourceGen needs to be able to assemble binaries for any system with any assembler, so it generally avoids platform-specific features. One exception to that is C64 PRG files.

PRG files start with a 16-bit value that tells the OS where the rest of the file should be loaded. The value is not usually part of the source code, but instead is generated by the assembler, based on the address of the first byte output. If SourceGen detects that a file is PRG, the source generators for some assemblers will suppress the first 2 bytes, and instead pass appropriate meta-data (such as an additional command-line option) to the assembler.

A file is treated as a PRG if:

The definition is sufficiently narrow to avoid most false-positives. If a file is being treated as PRG and you'd rather it weren't, you can add a label or reformat the bytes. This feature is currently only enabled for 64tass.

Cross-Assembling Generated Code

After generating sources, if you have a cross-assembler executable configured, you can run it by clicking the Run Assembler button. The command-line output will be displayed, with stdout and stderr separated. (I'd prefer them to be interleaved, but that's not what the system provides.)

The output will show the assembler's exit code, which will be zero on success (note: sometimes they lie). If it appeared to succeed, SourceGen will then compare the assembler's output to the original file, and report any differences.

Failures here may be due to bugs in the cross-assembler or in SourceGen. However, SourceGen can generally work around assembler bugs, so any failure is an opportunity for improvement.

Supported Assemblers

SourceGen currently supports the following cross-assemblers:

Version-Specific Code Generation

Code generation must be tailored to the specific version of the assembler. This is most easily understood with an example.

If the code has a statement like MVN #$01,#$02, the assembler is expected to output 54 02 01, with the arguments reversed. cc65 v2.17 got it backward; the behavior was fixed in v2.18. The bug means we can't generate the same MVN/MVP instructions for both versions of the assembler.

Having version-dependent source code is a bad idea. If we generated reversed operands (MVN #$02,#$01), we'd get the correct output with v2.17, but the wrong output for v2.18. Unambiguous code can be generated for all versions of the assembler by just outputting raw hex bytes, but that's ugly and annoying, so we don't want to be stuck doing that forever. We want to detect which version of the assembler is in use, and output actual MVN/MVP instructions when producing code for versions of the assembler that don't have the bug.

When you configure a cross-assembler, SourceGen runs the executable with version query arguments, and extracts the version information from the output stream. This is used by the generator to ensure that the output will work correctly with the installed assembler. If the assembler is present on the system, SourceGen will produce code optimized for the latest supported version of the assembler.

Assembler-Specific Bugs & Quirks

This is a list of bugs and quirky behavior in cross-assemblers that SourceGen works around when generating code.

Every assembler seems to have a different way of dealing with expressions. Most of them will let you group expressions with parenthesis, but that doesn't always help. For example, PEA label >> 8 + 1 is perfectly valid, but writing PEA (label >> 8) + 1 will cause most assemblers to assume you're trying to use an alternate (and non-existent) form of PEA with indirect addressing, causing the assembler to halt with an error message. The code generator needs to understand expression syntax and operator precedence to generate correct code, but also needs to know how to handle the corner cases.

Undocumented Opcodes

The data sheet for the 6502 does not define all 256 possible opcodes. Analysis and experimentation have found that many of these "undocumented" operations actually do useful things. The people who did the research didn't always use the same mnemonic names for them, which led to a bit of confusion in assemblers.

The most authoritative source is NMOS 6510 Unintended Opcodes (PDF). The document defines a primary mnemonic and lists common aliases for each operation. SourceGen will output the primary mnemonic unless the target assembler doesn't handle it.

64tass

Tested versions: v1.53.1515, v1.54.1900, v1.55.2176, v1.56.2625 [web site]

Bugs:

Quirks:

ACME

Tested versions: v0.96.4, v0.97 [web site]

Bugs:

Quirks:

cc65

Tested versions: v2.17, v2.18 [web site]

Bugs:

Quirks:

Merlin 32

Tested Versions: v1.0 [web site] [bug tracker]

Bugs:

Quirks:

Exporting Source Code

The "export" function takes what you see in the code list in the app and converts it to text or HTML. The options you've set in the app settings, such as capitalization, text delimiters, pseudo-opcode names, operand expression style, and display of cycle counts are all taken into account. The file generated is not expected to work with an actual assembler.

The text output is similar to what you'd get by copying lines to the clipboard and pasting them into a text file, except that you have greater control over which columns are included. The HTML version is augmented with links and (optionally) images.

Use File > Export to open the export dialog. You have several options:

Once you've picked your options, click either "Generate HTML" or "Generate Text", then select an output file name from the standard file dialog. Any additional files generated, such as graphics for HTML pages, will be written to the same directory.

All output uses UTF-8 encoding. Filenames of HTML files will have '#' replaced with '_' to make linking easier.

Generating Label Files

Some debuggers allow the import of labels associated with addresses. To generate such a file, use File > Generate Label File.

Select the desired output format (currently only VICE label commands are supported), and whether or not to include auto-generated labels.