In the end the whole notion of goodness and badness will be covered by
only six words -- in reality, only one word. Don't you see the beauty
of that, Winston?
Nineteen eighty-four, George Orwell.
Newspeak is a
simplified programming language, well-suited for the purpose of static
analysis.
Software
-
c2newspeak compiles
C programs into Newspeak.
-
ada2newspeak compiles
Ada programs into Newspeak.
-
npkstats computes statistics about
Newspeak programs.
-
npkpointer performs pointer analysis
on Newspeak programs.
-
...
Distribution
Newspeak v. 1.7 source code is available:
newspeak-1.7.tar.gz.
It is distributed under the LGPL. Newspeak is also available at
SourceForge.
Previous versions:
newspeak-1.6.tar.gz,
newspeak-1.5.tar.gz,
newspeak-1.4.tar.gz,
newspeak-1.3.tgz,
newspeak-1.2.tgz,
C2Newspeak-1.1.tgz,
C2Newspeak-1.0.tgz,
C2Newspeak-0.9.tgz.
Requirements
Newspeak utilities are written in
Objective Caml.
Documentation
Development version
The latest version of the source code
can also be retrieved from this mercurial
repository:
http://hg.penjili.org/c2newspeak-ref.
Mercurial is a
distributed source management tool, which can be found at
http://www.selenic.com/mercurial/wiki/.
Bug reports
The code can be browsed
here,
and tickets submitted
there
to report bugs, comments, missing features...
Examples
Legend
Here are a few compilation examples from C to Newspeak. In the following, the
C code will be on the left side and the corresponding Newspeak code on the
right side:
Types
Integer types are normalized according to their size and sign. Their size in
number of bits, which is architecture dependent, is made explicit.
int i1;
unsigned int i2;
char i3;
unsigned char i4;
int32 i1;
uint32 i2;
int8 i3;
uint8 i4;
Casts (and unions) in C allow programmers to manipulate sequences of bytes
with any type. Consequently, Newspeak distinguishes only two
types of pointers: data and function pointers.
int *p1;
unsigned int *p2;
int (*p3)[10];
struct { int x; } *p4;
int (*fp)(int);
ptr p1;
ptr p2;
ptr p3;
ptr p4;
fptr fp;
Newspeak composite data structures are arrays and regions. A region is a
sequence of bits. Some offsets in the region are indicated to store values
of a given type. Regions can encode both C structures and unions,
while making explicit their architecture dependent parameters: namely,
fields' offsets, paddings and the overall type size.
int t[10];
struct {
int x; char y; char* z;
} s;
union {
int x; char y; char* z;
} u;
int t1[10][20];
int t2[10][20][30];
struct {
int x; struct { char z; } y;
} s1;
struct {
int x[10];
struct { char z[10]; } y[10];
} s2;
struct {
int z;
union { int x; char y; } t;
} s3;
int32[10] t;
{
int32 0; int8 32; ptr 64;
}96 s;
{
int32 0; int8 0; ptr 0;
}32 u;
int32[20][10] t1;
int32[30][20][10] t2;
{
int32 0; { int8 0; }8 32;
}64 s1;
{
int32[10] 0;
{ int8[10] 0; }80[10] 320;
}1120 s2;
{
int32 0;
{ int32 0; int8 0; }32 32;
}64 s3;
Variables
Global and Local variables are now both designated by their name.
int x;
void main() {
int y;
int z;
x = y;
x = z;
}
int32 x;
void main(void) {
int32 y;
int32 z;
x =(int32) 1-_int32;
x =(int32) 0-_int32;
}
Left values and expressions
Fields and array elements are accessed by shifting the structure or array
address by some offset.
In the case of array element access, the operator
belongs allow to check that
the index is well within bounds.
struct {
int a; int b;
} x;
int t[10];
int i;
x.b =
t[i];
{
int32 0; int32 32;
}64 x;
int32[10] t;
int32 i;
x + 32 =(int32)
t + (belongs[0,9] (i_int32) * 32)_int32;
Integer operations are decomposed in an exact operation followed
by a coercion back to the result's expected range.
int x, y, z;
x = y + z;
x = y * z;
int32 x; int32 y; int32 z;
x =(int32) coerce[-2147483648,2147483647] (y_int32 + z_int32);
x =(int32) coerce[-2147483648,2147483647] (y_int32 * z_int32);
The
coerce operator is also used for
cast between integer of different size or sign.
Pointer creations are annotated by the size of the buffer
they designate, so as to allow invalid pointer operations checks.
int* x;
int t[100];
x = &t[3];
x = x + 5;
*x = 3;
ptr x;
int32[100] t;
x =(ptr) (focus3200 &(t) + 96);
x =(ptr) (x_ptr + 160);
[x_ptr]32 =(int32) 3;
Casts between integer and pointers are accepted by default with a warning "dirty cast from pointer to integer".
int* p;
int x;
x = p;
ptr p;
int32 x;
x =(int32) (int32) p_ptr;
Unless option --reject-dirty-cast is set.
int* p;
int x;
x = p;
Fatal error: test.c:5#0: dirty cast from pointer to integer, rewrite your code or remove option --reject-dirty-cast
Commands
Conditionals are translated into à la Dijkstra alternative choice commands.
int x;
if (x < 10) {
x++;
}
int32 x;
choose {
-->
guard((10 > x_int32));
x =(int32) coerce[-2147483648,2147483647] (x_int32 + 1);
-->
guard(! (10 > x_int32));
}
Function return statements are replaced by jumps and labels.
int main() {
int x;
if (x < 10) {
return 1;
}
return 0;
}
int32 main(void) {
int32 x;
do {
choose {
-->
guard((10 > x_int32));
!return =(int32) 1;
goto lbl0;
-->
guard(! (10 > x_int32));
}
!return =(int32) 0;
} with lbl0: {
}
}
Loops are built with a combination of the alternative, jumps and the infinite
loop.
int x;
x = 0;
while (x < 10) {
x++;
}
int32 x;
x =(int32) 0;
do {
while (1) {
choose {
-->
guard((10 > x_int32));
-->
guard(! (10 > x_int32));
goto lbl1;
}
x =(int32) coerce[-2147483648,2147483647] (x_int32 + 1);
}
} with lbl1: {
}
Function calls representation stick to source code:
int f(int a, int b) {
return a + b;
}
void main() {
int x, y, z;
z = f(x, y);
}
int32 f(int32 a, int32 b) {
!return =(int32) coerce[-2147483648,2147483647]
(a_int32 + b_int32);
}
void main(void) {
int32 x; int32 y; int32 z;
z <- f(x_int32, y_int32);
}
There is much more! Feel free to experiment and let us know your thoughts.