Much has been made in the past and again more recently, about the lack of a compiler switch in Delph 2009 to govern the behaviour of the String type.
CodeGear have repeatedly said that it was not possible/practical to provide such a switch, but their advice to anyone concerned about a unilateral change from ANSI to Unicode string in their applications itself suggests that a switch was not only possible, but actually very simple to incorporate. So much so that they could provide it even now without having to change anything already delivered in Delphi 2009 or committed to for Delphi 2010.
Let me explain what I mean.
It’s Just Not Possible
When asserting that it is impossible to provide a switch to govern String type, there has I think been an assumption that it was expected that this switch would also affect the VCL and RTL. Hands are thrown in the air and there is much wailing and gnashing of teeth about the impracticalities of providing two versions of the VCL and RTL – one ANSI and one Unicode.
But I believe that most developers are happy or at least willing to accept that the VCL and RTL should make/have made the transition to Unicode – it is their own application code that is of concern. Even for those who might ideally prefer a switchable or dual-VCL etc, a situation where application code can be more easily migrated whilst preserving ANSI’ness where needed is likely to be more palatable than the current “Unicode or Bust” predicament that they find themselves in.
But in respect of application code (by which I mean code that is not provided as part of the VCL/RTL, so this would also encompass 3rd party component code for which you have the source), the standard advice for anyone concerned that String variables becoming UnicodeString in Delphi 2009 is this:
Find all declarations of String and Char and related types in your code, and change them to ANSIString and ANSIChar etc.
Although the source for the VCL is provided with Delphi, making modifications – directly – to that source is not always straightforward and of course not officially supported and future releases of the VCL source will make no attempt to accommodate “local” revisions to VCL source.
So in general, whilst it may be possible it is not “a good idea” and in any case, changing the formal types of string type variables cannot – practically – extend to the VCL or the RTL units.
The Long (Hand) and Short (Hand) Of It
We can conceptually wind the clock back, to the time of the introduction of LongString in Delphi. At that time a switch was provided and the VCL/RTL worked with either Short or Long String types. But let us instead imagine that the same arguments w.r.t “Dual VCL’s” had applied to LongString and that the VCL/RTL had unilaterally “gone LongString”.
If the VCL/RTL were declared to be exclusively “LongString” capable, then each unit in the VCL/RTL would simply have needed a compiler directive to override any project settings:
{$H+}
or alternatively (and more clearly)
{$LONGSTRINGS ON}
In individual applications, the lack of any specific directive in a unit would allow the project specific setting of this compiler switch to prevail, but the embedding of the directive in the VCL/RTL units would ensure that those would remain “LongString”.
It seems to me that the same approach could be extended quite simple to the String type differences introduced by the transition to Unicode:
{$STRINGTYPE UNICODE} // String = UnicodeString, Char = WideChar etc {$STRINGTYPE ANSI} // String = ANSIString, Char = ANSIChar etc
The absence of a $STRINGTYPE directive would of course allow the project setting to apply. In VCL/RTL units the directive would be present to ensure UNICODE support in the VCL/RTL as currently assumed in the Delphi 2009 source. The current assumption would simply be made explicit.
NOTE: The existing $LONGSTRINGS directive would have an affect only when $STRINGTYPE is ANSI.
In otherwords, a $STRINGTYPE directive would provide a way to do in a single line (in a quickly and easily reversible fashion) exactly what CodeGear currently recommend is done in a more laborious and more difficult to reverse fashion – i.e. manually changing the types of string declarations in application code.
A Small, Specific Example
In one project I am involved in, the FastStrings unit is used extensively throughout the application (reflecting the fact that the application has a long and illustrious history).
The FastStrings unit simply does not compile in Delphi 2009 due to the use of “String” and “Char” declarations and some invalid assumptions about element sizes in some inline assembler code that result when compiled with those types reflecting UnicodeString and WideChar and not the ANSI types (which FastStrings assumes).
The solution – currently – is to go through the FastStrings unit replacing “String” with “ANSIString”, “Char” with “ANSIChar” etc.
If a $STRINGTYPE directive were available, the exact same result could be achieved by simply adding:
{$STRINGTYPE ANSI}
At the top of that unit. This unit would then compile exactly as it did in previous versions of Delphi and it’s programming interface to the “outside world” (other code) would reflect it’s ANSI specific nature, just as it would had the declarations all been laboriously and manually modified.
In the meantime, the VCL would continue to be Unicode, the RTL would continue to be Unicode and application code using the FastStrings unit would continue to use whatever String type was applicable in each case.
Even without such a directive added to the top of the FastStrings unit itself, if an application were compiled in Delphi 2009 with it’s project settings configured with STRINGTYPE = ANSI, it too would compile with FastStrings, but again the VCL and RTL would remain resolutely Unicode.
It is important to remember of course that suitable warnings/errors would still be presented if ANSI application code were found to cause difficulties where string data is passed between application code and the VCL/RTL, but this situation again is exactly the same as if that application code had been manually modified to an entirely and explicitly ANSI base.
The difference is, you could quickly check either across an entire application or in an individual unit, what the final outcome would be of compiling for ANSI/UNICODE, without having to go to so much trouble.
NOTE: As my investigation into string performance in Delphi 2009 last year revealed, much of the benefit of the FastStrings library appears to be superceded in Delphi 2009, but not all. It’s use here is to provide an example of how a compiler switch would assist in easily and safely migrating even extremely “String sensitive” code, not to provide a concrete example of where the use of such a switch would necessarily be the optimal strategy in any one particular case.
I agree with your thoughts completely. But I would prefer that the directive be more of a bracketing instruction – ie $ANSISTRING ON $ANSISTRING OFF. This new compiler directive would simply textually change “string” to “AnsiString” in the parser when the code is enclosed in an $ANSISTRING ON block – nothing more nothing less — ie the result would be exactly as if you had changed the word “string” to “AnsiString” yourself everywhere in the blocks where this directive is ON.
The VCL/RTL stays exactly as it is in D2009.
Hi Richard. Thanks for the comments.
I presume you mean also that there would be no project option to set a project-wide behaviour?
i.e. that ANSISTRING would be OFF by default and always OFF unless explicitly set “ON” in a unit, and remain ON only for that unit, so that RTL/VCL units would be UNICODE without requiring a specific directive to enforce that.
I think most people wishing for a switch have a scenario in mind where when they open an old project in Delphi 2009 for the first time, that their current code will retain it’s ANSI-ness (the project setting will be ANSI for old projects, UNICODE for new ones) unless and until specified otherwise.
But still, being able to add a simple directive to any/all application units would be less work than having to change all variable declarations. It would however be slightly more work to then remove that directive if/when a project completes a migration to Unicode readiness, and/or is suitable for use in /either/ ANSI or Unicode projects.
i.e. a unit containing additional string utility routines must still provide both ANSI and Unicode API’s if it is to be used in a project that may wish to use both string types.
But if it is a unit that will be used /either/ in an ANSI project /or/ in a Unicode project, then being adaptable to a project specified option would be useful.
I imagine that in your proposed approach, people would quickly start providing their own “project setting”:
{$ifdef UNICODE}
{$ANSISTRING OFF}
{$endif}
and then add UNICODE (or whatever) to their project conditionals, to get the same behaviour that they could have had with a project level setting. Unit specific behaviour – where required – is really no different in either case.
Another potential problem is that in older versions of Delphi the compiler would halt on an unrecognized/unsupported directive, so having added $ANSISTRING OFF to all application units, those units could not then (as) easily be shared with projects still compiling with older Delphi versions. More contortions would be needed in the directives added to a unit.
The need to explicitly force UNICODE is I think likely to be less common (in a “sharing with older versions of Delphi context”) than the wish to have code be adaptable to either ANSI or UNICODE as required in a project.
But we each obviously view the need from our own particular circumstances and project needs, so we are likely to see different advantages and disadvantages in either approach. 🙂
One might wonder why it is an error to do
type
string = ANSIstring;
while it is OK to have
type
integer = ANSIstring;
Not that it actually would help in the string switch case, as the type definition is inherited into all subsequently used units.
Now that starts me wondering about another thing – if unitA does
type integer = shortint;
uses unitC;
and unitB does
type integer = longint;
uses unitC;
what will actually happen?
Does this also mean it would be possible to target Win95/98/ME again?
@Anders:
Your first question is easily answered: “String” is a reserved word, hence the error, where-as Integer is not.
As for the second question – or speculation really – I’m not sure what you mean by “type definition inherited into subsequently used units”.
If I understood your example correctly, (both UnitA and UnitB defining “integer” differently and then using UnitC), then UnitC will simply treat “Integer” as the normal system defined 32 bit signed type.
The type Integer only changes within UnitC if UnitC itself uses either UnitA or UnitB, in which case the normal rules of uses clause precedence would determine which “Integer” is referenced, if not qualified by a unit name.
@Ken, Unfortunately not since the VCL and RTL would still be Unicode.
This envisaged implementation of a switch would only enable application code to be ANSI/Unicode switchable primarily as a short cut to making it ANSI for projects where that is necessary or desirable if only as a stepping stone to an eventual migration to Unicode.
@ Jolyon: Thanks. It’s an issue in the engineering world. Lots of extremely old programs still get used because the machines they serve are still in service. I am aware of UNICOWS.DLL, but for now using D7/D2007 is easier.
And besides, if the generics classes are buggy in D2009 (which is what I hear in various places), I have no reason to switch just yet.
@Jolyon:
Yes – I know string is a reserved word, I was just mildly wondering why…
Second part: Use case error – I got it upside down. My unitB used unitA in the implementation part, which confused me (I just grabbed some project and played with it).
Anders, sorry – didn’t mean to teach Granny to suck eggs, as they say. 🙂
I’m thinking that “String” get’s special treatment as a reserved word because “String” get’s special treatment as a type.
It’s “specialness” has perhaps increased and become more complex over the years, but it always was handled in a very specific way by Delphi and to be fair probably by Pascal more generally.
There may be technical solutions possible to allow a user to define their own formal “string” type, but the practical benefit probably isn’t worth it.