Newspeak


In the end the whole notion of goodness and badness will be covered by only six words -- in reality, only one word. Don't you see the beauty of that, Winston?
Nineteen eighty-four, George Orwell.



Newspeak is a simplified programming language, well-suited for the purpose of static analysis.

Software

Distribution

Newspeak v. 1.7 source code is available: newspeak-1.7.tar.gz. It is distributed under the LGPL. Newspeak is also available at SourceForge.

Previous versions: newspeak-1.6.tar.gz, newspeak-1.5.tar.gz, newspeak-1.4.tar.gz, newspeak-1.3.tgz, newspeak-1.2.tgz, C2Newspeak-1.1.tgz, C2Newspeak-1.0.tgz, C2Newspeak-0.9.tgz.

Requirements

Newspeak utilities are written in Objective Caml.

Documentation

Development version

The latest version of the source code can also be retrieved from this mercurial repository: http://hg.penjili.org/c2newspeak-ref. Mercurial is a distributed source management tool, which can be found at http://www.selenic.com/mercurial/wiki/.

Bug reports

The code can be browsed here, and tickets submitted there to report bugs, comments, missing features...

Examples

Legend

Here are a few compilation examples from C to Newspeak. In the following, the C code will be on the left side and the corresponding Newspeak code on the right side:
C code
Newspeak code

Types

Integer types are normalized according to their size and sign. Their size in number of bits, which is architecture dependent, is made explicit.
int i1;
unsigned int i2;
char i3;
unsigned char i4;
int32 i1;
uint32 i2;
int8 i3;
uint8 i4;

Casts (and unions) in C allow programmers to manipulate sequences of bytes with any type. Consequently, Newspeak distinguishes only two types of pointers: data and function pointers.
int *p1;
unsigned int *p2;
int (*p3)[10];
struct { int x; } *p4;
int (*fp)(int);
ptr p1;
ptr p2;
ptr p3;
ptr p4;
fptr fp;

Newspeak composite data structures are arrays and regions. A region is a sequence of bits. Some offsets in the region are indicated to store values of a given type. Regions can encode both C structures and unions, while making explicit their architecture dependent parameters: namely, fields' offsets, paddings and the overall type size.
int t[10];
struct {
int x; char y; char* z;
} s;
union {
int x; char y; char* z;
} u;

int t1[10][20];
int t2[10][20][30];

struct {
int x; struct { char z; } y;
} s1;
struct {
int x[10];
struct { char z[10]; } y[10];
} s2;
struct {
int z;
union { int x; char y; } t;
} s3;
int32[10] t;
{
int32 0; int8 32; ptr 64;
}96 s;
{
int32 0; int8 0; ptr 0;
}32 u;

int32[20][10] t1;
int32[30][20][10] t2;

{
int32 0; { int8 0; }8 32;
}64 s1;
{
int32[10] 0;
{ int8[10] 0; }80[10] 320;
}1120 s2;
{
int32 0;
{ int32 0; int8 0; }32 32;
}64 s3;

Variables

Global and Local variables are now both designated by their name.
int x;
void main() {
int y;
int z;
x = y;
x = z;
}
int32 x;
void main(void) {
int32 y;
int32 z;
x =(int32) 1-_int32;
x =(int32) 0-_int32;
}

Left values and expressions

Fields and array elements are accessed by shifting the structure or array address by some offset. In the case of array element access, the operator belongs allow to check that the index is well within bounds.
struct {
int a; int b;
} x;
int t[10];
int i;

x.b =
t[i];
{
int32 0; int32 32;
}64 x;
int32[10] t;
int32 i;

x + 32 =(int32)
t + (belongs[0,9] (i_int32) * 32)_int32;
Integer operations are decomposed in an exact operation followed by a coercion back to the result's expected range.
int x, y, z;

x = y + z;
x = y * z;
int32 x; int32 y; int32 z;

x =(int32) coerce[-2147483648,2147483647] (y_int32 + z_int32);
x =(int32) coerce[-2147483648,2147483647] (y_int32 * z_int32);
The coerce operator is also used for cast between integer of different size or sign.
Pointer creations are annotated by the size of the buffer they designate, so as to allow invalid pointer operations checks.
int* x;
int t[100];

x = &t[3];
x = x + 5;
*x = 3;
ptr x;
int32[100] t;

x =(ptr) (focus3200 &(t) + 96);
x =(ptr) (x_ptr + 160);
[x_ptr]32 =(int32) 3;
Casts between integer and pointers are accepted by default with a warning "dirty cast from pointer to integer".
int* p;
int x;
x = p;
ptr p;
int32 x;
x =(int32) (int32) p_ptr;
Unless option --reject-dirty-cast is set.
int* p;
int x;
x = p;
Fatal error: test.c:5#0: dirty cast from pointer to integer, rewrite your code or remove option --reject-dirty-cast

Commands

Conditionals are translated into à la Dijkstra alternative choice commands.
int x;
if (x < 10) {
x++;
}
int32 x;
choose {
-->
guard((10 > x_int32));
x =(int32) coerce[-2147483648,2147483647] (x_int32 + 1);
-->
guard(! (10 > x_int32));
}
Function return statements are replaced by jumps and labels.
int main() {
int x;
if (x < 10) {


return 1;
}


return 0;
}
int32 main(void) {
int32 x;
do {
choose {
-->
guard((10 > x_int32));
!return =(int32) 1;
goto lbl0;
-->
guard(! (10 > x_int32));
}
!return =(int32) 0;
} with lbl0: {
}
}
Loops are built with a combination of the alternative, jumps and the infinite loop.
int x;
x = 0;
while (x < 10) {
x++;
}
int32 x;
x =(int32) 0;
do {
while (1) {
choose {
-->
guard((10 > x_int32));
-->
guard(! (10 > x_int32));
goto lbl1;
}
x =(int32) coerce[-2147483648,2147483647] (x_int32 + 1);
}
} with lbl1: {
}
Function calls representation stick to source code:
int f(int a, int b) {
return a + b;
}

void main() {
int x, y, z;
z = f(x, y);
}
int32 f(int32 a, int32 b) {
!return =(int32) coerce[-2147483648,2147483647]
(a_int32 + b_int32);
}

void main(void) {
int32 x; int32 y; int32 z;
z <- f(x_int32, y_int32);
}
There is much more! Feel free to experiment and let us know your thoughts.