SourceGen: Properties & Settings

Settings Overview

There are two kinds of settings: application settings, and project properties.

Application Settings

Application settings are stored in a file called "SourceGen-settings" in the SourceGen installation directory. If the file is missing or corrupted, default settings will be used. These settings are local to your system, and include everything from window sizes to whether or not you prefer hexadecimal values to be shown in upper case. None of them affect the way the project analyzes code and data, though they may affect the way generated assembly sources look.

The settings editor is divided into five tabs. Changes don't take effect until you hit Apply or OK.

Code View

These settings change the way the code looks on screen.

Click the Column Visibility buttons to hide columns. Click them again to restore the column to a width appropriate for the current font. A "hidden" column just has a width of zero, so with careful mouse positioning you can show and hide columns by dragging the column headers. The buttons may be more convenient though.

You can select a different font for the code list, and make it as large or as small as you want. Mono-space fonts like Courier or Consolas are recommended (and will be the only ones shown).

You can choose to display different parts of the display in upper or lower case, using the "all lower" and "all upper" buttons as a quick way to set all values. These settings are also used for generated assembly code, unless the assembler has specific case-sensitivity requirements. There is no setting for labels, which are always case-sensitive.

The Clipboard drop-down list lets you choose the format for text copied to the clipboard. The Assembler Source format includes the rightmost columns (label, opcode, operand, and comment), like assembly source code does. The Disassembly format adds the address and bytes on the left. Use the All Columns format to get all columns.

When show cycle counts for instructions is checked, every instruction line will have an end-of-line comment that indicates the number of cycles required for that instruction. If the cycle count can't be determined solely from a static analysis, e.g. an extra cycle is required if LDA (dp),Y crosses a page boundary, a '+' will be shown. In some cases the variability can be factored out if the state of certain status flags is known, e.g. 65C02 instructions that take longer in decimal mode won't be shown as ambiguous if the analyzer can determine that D=0 or D=1. This checkbox enables display in the on-screen list, but does not affect generated source code, which can be configured independently on the Asm Config tab.

Check use 'dark' color scheme to change the main disassembly list to use white text on a black background, and mute the Note highlight colors. (Most of the GUI uses standard Windows controls that take their colors from the system theme, but the disassembly list uses a custom style. You can change the rest of the UI from the Windows display "personalization" controls.)

The auto-save interval selection determines the frequency of saves to the recovery file. Setting it to disabled will disable the feature entirely, and prevent recovery files from being created (though they will still be checked for when projects are opened).

Text Delimiters

These options change the way the code list looks on screen. They do not affect generated code, which must use the delimiter characters specified by the chosen assembler.

Character and string operands are shown surrounded by quotes, e.g. LDA #'*' or .STR "Hello, world!". It's handy to be able to tell at a glance how characters are encoded, so SourceGen allows you to set the delimiters independently for every supported character encoding.

String operands may contain a mixture of text and hexadecimal values. For example, in ASCII data, the control characters for linefeed and carriage return ($0a and $0d) are considered part of the string, but don't have a printable symbol. (Unicode defines some "control picture" glyphs, but they don't look very good at smaller font sizes.)

If one of the delimiter characters appears in the string itself, the character will be output as hex to avoid confusion. For this reason, it's generally wise to use delimiter characters that aren't part of the ASCII character set, such as "curly" quotes. The Sample Characters box holds some characters that you can copy and paste (with Ctrl+C / Ctrl+V) into the delimiter fields.

For character operands, the prefix and suffix are added to the start and end of the operand. For string operands, the prefix is added to the start of the first line, and suffixes aren't allowed.

The "quick set" pop-up can be used to set the fields for a few common configurations. The default set uses curly quotes with a few prefixes and suffixes, while "straight" uses the ASCII apostrophe and double-quote characters. "Merlin" uses a format similar to what the Merlin assembler expects.

Display Format

These options change the way the code list looks on screen. They do not affect generated code.

The operand width disambiguator strings are used when the width of an instruction operand is unclear. You may specify values for all of them or none of them.

Different assemblers have different ways of forming expressions. Sometimes the rules allow expressions to be written simply, other times explicit grouping with parenthesis is required. Select whichever style you are most comfortable with. (64tass and ACME use the "common" expression style, cc65 and Merlin 32 have their own unique styles.)

The character used to start a full-line comment is usually ';', but can be changed here.

Non-unique labels are identified with a prefix character, typically '@' or ':'. The default is '@', but you can configure it to any character that isn't valid for the start of a label. (64tass uses '_' for locals, but that's a valid label start character, and so isn't allowed here.) The setting affects label editing as well as display.

If you would like your local variables to be shown with a prefix character, you can set it in the local variable prefix box.

The comma-separated format for bulk data determines whether large blocks of hex look like ABC123 or $AB,$C1,$23. The former reduces the number of lines required, the latter is more readable.

The "quick set" pop-up configures the fields on the left side of the tab to match the conventions of the specified assembler. Select your preferred assembler in the combo box to set the fields. The setting automatically changes to "custom" when you modify a field.

The add spaces in Bytes column checkbox changes the format of the hex data in the code list "bytes" column from dense (20edfd) to spaced (20 ed fd). This also affects the format of clipboard copies and exports.

Long operands, such as strings and bulk data, are wrapped to a new line after a certain number of characters. Use the wrap operands pop-up to configure the value. Larger values can make the code display more compact, but smaller values allow you to shrink the width of the operand column in the on-screen listing, moving the comment field closer in.

The adjustments that are generated for operands, such as FOO-1 or FOO+$1000, can be output as hex or decimal depending on the magnitude of the adjustment. The use hex for adjustments larger than setting determines the threshold.

Pseudo-Op

These options change the way the code list looks on screen. Assembler directives and data pseudo-opcodes will use these values. This does not affect generated source code, which always matches the conventions of the target assembler.

Enter the string you want to use for the various data formats. If a field is left blank, a default value is used.

The "quick set" pop-up configures the fields on this tab to match the conventions of the specified assembler. Select your preferred assembler in the combo box to set the fields. The setting automatically switches to "custom" when you edit a field.

Asm Config

These settings configure cross-assemblers and modify assembly source generation in various ways.

To configure an assembler, select it in the pop-up menu. The fields will initially contain assembler-specific default values. The values in the Assembler-Specific Configuration box may be configured independently for each assembler.

The "executable" box holds the full path to the cross-assembler executable.

64tass: 64tass.exe
ACME: acme.exe
cc65: bin/cl65.exe -- full installation required, with all configuration files and libraries
Merlin 32: Merlin32.exe

The column widths section allows you to specify the minimum width of the label, opcode, operand, and comment fields. If the width is less than 1, or isn't a valid number, 1 will be used. These are not hard stops: if the contents of a field are too wide, the contents of the next column will be pushed over. (The comment field width is not currently being used, but may be used to fold lines in the future.)

The next section, Code Generation Settings, affects all assemblers.

When show cycle counts in comments is checked, cycle counts are inserted into end-of-line comments. This works the same as the option in the Code View tab, but applies to generated source code rather than the on-screen display.

If you enable identify assembler in output, a comment will be added to the top of the generated assembly output that identifies the target assembler and version. It also shows the command-line options passed to the assembler. This can be very helpful if the source file is sent to other people, since it may not otherwise be obvious from the source file what the intended target assembler is, or what options are required to process the file correctly.

Some 6502 instructions have an "implied" accumulator address mode, e.g. LSR and ASL. The operand may be shown as "A" to make the address mode explicit, but it can also be omitted. Some assemblers require it to be present, some require it to be absent, most allow either. By default, the operand is shown, but enabling omit implied accumulator operands if allowed will cause it to be omitted on-screen, in the instruction chart, and in source generated for assemblers that don't require it to be present.

Labels can generally be placed either on the same line as a code or data operand, or on the line before it. Placing them on the same line makes the output a bit more compact, but if the label is longer than the label column is wide, the subsequent fields can be pushed out of alignment. The placement is configurable. Labels can be output on their own line:

Only when required - labels will not be placed on a separate line unless the assembler requires them to be.
When the label is wider than the field - labels will only be placed on a separate line when they don't fit in the label column.
Whenever possible - labels are always placed on a separate line when they are allowed to be. Most assemblers require that the label be on the same line as assignment pseudo-ops, e.g. "FOO = $1000".

Project Properties

Project properties are stored in the .dis65 project file. They specify which CPU to use, which extension scripts to load, and a variety of other things that directly impact how SourceGen processes the project. Because of the potential impact, all changes to the project properties are made through the undo/redo buffer, which means you hit "undo" to revert a property change.

The properties editor is divided into four tabs. Changes aren't pushed out to the main application until you close the dialog. Clicking Apply will capture the current changes, ensuring that they're applied even if you later hit Cancel, but the changes are not applied to the project immediately.

General

The choice of CPU determines the set of available instructions, as well as cycle costs and register widths. There are many variations on the 6502, but from the perspective of a disassembler most can be treated as one of these four:

MOS 6502. The original 8-bit instruction set.
WDC 65C02. Expanded the instruction set and smoothed some rough edges.
WDC W65C02S. An enhanced version of the 65C02, with some additional instructions introduced by Rockwell (R65C02), as well as WDC's STP and WAI instructions. The Rockwell additions overlap with 65816 instructions, so code that uses them will not work on 16-bit CPUs.
WDC W65C816S. Expanded instruction set, 24-bit address space, and 16-bit registers.

The Hudson Soft HuC6280 and Commodore CSG 4510 / 65CE02 are very similar, but they have additional instructions and some fundamental architectural changes. These are not currently supported by SourceGen.

If enable undocumented instructions is checked, some additional opcodes are recognized on the 6502 and 65C02. These instructions are not part of the chip specification, but most of them have consistent behavior and can be used. If the box is not checked, the instructions are treated as invalid and cause the code analyzer to assume that it has run into a data area. This option has no effect on the 65816.

The treat BRK as two-byte instruction checkbox determines whether BRK instructions should be handled as if they have an operand.

The entry flags determine the initial value for the processor status flag register. Code that is unreachable internally (requiring a code start point tag) will use this value. This is chiefly of value for 65816 code, where the initial value of the M/X/E flags has a significant impact on how instructions are disassembled.

If analyze uncategorized data is checked, SourceGen will attempt to identify character strings and regions that are filled with a repeated value. If it's not checked, anything that isn't detected as code or explicitly formatted as data will be shown as individual byte values.

If seek nearby targets is checked, the analyzer will try to use nearby labels for data loads and stores, adjusting them to fit (e.g. LDA LABEL+1). If not enabled, labels are not applied unless they match exactly. Note that references into the middle of an instruction or formatted data area are always adjusted, regardless of how this is set. This setting has no effect on local variables, and only enables a 1-byte backward search on project/platform symbols.

The use relocation data checkbox is only available if the project was created from a relocatable source, e.g. by the OMF Converter tool. If checked, information from the relocation dictionary will be used to improve automatic operand formatting.

If smart PLP handling is checked, the analyzer will try to use the processor status flags from a nearby PHP when a PLP is encountered. If not enabled, all flags are set to "indeterminate" following a PLP, except for the M/X flags on the 65816, which are left unmodified. (In practice this approach doesn't seem to work all that well, so the setting is un-checked by default.)

If smart PLB handling is checked, the analyzer will watch for code patterns like PLB preceded by PHK, and generate appropriate Data Bank Register changes. If not enabled, the DBR is set to the bank of the address of the start of the file, and does not change unless explicitly set. Only useful for 65816 code.

The default text encoding setting has two effects. First, it specifies which character encoding to use when searching for strings in uncategorized data. Second, if an assembler has a notion of preferred character encoding (e.g. you can default string operands to PETSCII), this setting will determine which encoding is preferred in generated sources.

The min chars for string detection setting determines how many printable characters (based on the default text encoding setting) need to appear consecutively for the data analyzer to decide it's a string. Shorter values are prone to false-positive identifications, longer values miss out on short strings. You can also set it to "none" to disable automatic string identification.

The auto-label style setting determines the format for labels that are generated automatically. By default the label will be the letter 'L' followed by the hexadecimal address, but the label can be annotated based on usage. For example, addresses that are the target of branch instructions can be labeled with the letter 'B'.

Project Symbols

You can add, edit, and delete individual symbols and constants. See the symbols section for an explanation of how project symbols work.

The Edit Symbol button opens the Edit Project Symbol dialog, which allows changing any part of a symbol definition. You're not allowed to create two symbols with the same label.

The Import button allows you to import symbols from another project. Only labels that have been tagged as global and exported will be imported. Existing symbols with identical labels will be replaced, so it's okay to run the importer multiple times. Labels that aren't found will not be removed, so you can safely import from multiple projects, but will need to manually delete any symbols that are no longer being exported.

Shortcut: you can open the project properties window with the Project Symbols tab selected by hitting F6 from the main code list.

Symbol Files

From here, you can add and remove platform symbol files, or change the order in which they are loaded. See the symbols section for an explanation of how platform symbols work, and the advanced topics section for a description of the file syntax.

Platform symbol files must live in the RuntimeData directory that comes with SourceGen, or in the directory where the project file lives. This is mostly to keep things manageable when projects are distributed to other people, but also acts as a minor security check, to prevent a wayward project from trying to open files it shouldn't.

Click one of the "Add Symbol Files" buttons to include one or more symbol files in the project. The Add Symbol Files from Runtime button sets the directory to the SourceGen RuntimeData directory, while Add Symbol Files from Project starts in the project directory. If you haven't yet saved the project, the latter button will be disabled. The only difference between the buttons is the initial directory.

In the list, files loaded from the RuntimeData directory will be prefixed with RT:. Files loaded from the project directory will be prefixed with PROJ:.

If a platform symbol file can't be found when the project is opened, you will receive a warning.

Extension Scripts

From here, you can add and remove extension script files. See the extension scripts section for details on how extension scripts work.

Extension script files must live in the RuntimeData directory that comes with SourceGen, or in the directory where the project file lives. This is mostly to keep things manageable when projects are distributed to other people, but also acts as a minor security check, to prevent a wayward project from trying to open files it shouldn't.

Click one of the "Add Scripts" buttons to include one more scripts in the project. The Add Scripts from Runtime button sets the directory to the SourceGen RuntimeData directory, while Add Scripts from Project starts in the project directory. If you haven't yet saved the project, the latter button will be disabled. The only difference between the buttons is the initial directory.

In the list, files loaded from the RuntimeData directory will be prefixed with RT:. Files loaded from the project directory will be prefixed with PROJ:.

If an extension script file can't be found when the project is opened, you will receive a warning.