Things to fix the language

Everything on this page is just thoughts. Please provide any comments to theo@crazygreek.co.uk.

Lvalue reference return

 
$cows = 0;
 
function & moo() 
{
  global $cows;
  return $cows;
}
 
moo() = 1;
 
echo $cows; // Should print '1'
 

Array lvalue de-referencing assignment

the list() grammar in PHP is absolutely awful. It means this fails for bizaare reasoning:

function list()
{
}

Fails with:

Parse error: syntax error, unexpected T_LIST, expecting T_STRING in /home/theo/PHP/phox/moo.php on line 3

Yet:

 echo list($var), "\n";

fails to parse with this error:

Parse error: syntax error, unexpected ',', expecting '=' in /home/theo/PHP/phox/moo.php on line 3

I propose that the grammar for list() is changed from function style calling to array handling at the grammar levle:



 // Same as something like this:

 [ $v1, $v2 ] = array(1, 2);

 // Or even:

 [ $v1, $v2 ] = [ 1, 2 ];

 // Also:

 $a1 = array(1, 2);

 [ $v1, $v2 ] = $a1;

In terms of PHP compatibility, we emulate this code:

 list($v1, $v2) = array(1, 2);

like this:

 
 _php_compat_list(
   array(&$v1, &$v2),
   array(1, 2)
 );
 
 function _php_compat_list(&$lval, &$rval)
 {
  $i = 0;
  foreach ($rval as $key => & $value)
  {
    $lval[$i++] = $value;
  }
 }

If we can add a new operator to allow the right hand elements of an array be assigned ot the left hand, we could do something like this too:

 array(&$v1, &$v2) **[SomeOperator]** array(1, 2);
 assert($v1 == 1);
 assert($v2 == 2);

Which opens up the potential for this to work:

 
  function & list()
  {
    $args = func_get_args();
    return $args;
  }
 
  list($v1, $v2) = array(1, 2);
 

which seems a lot cleaner :)

Namespaces

A must have. namespace xxx {}. Msut be nestable. Also possibly allow class declaration within namespaces without being in the namespace directly, for example, this would create the class in com.domain.MyClass:

 
namespace com
{
  class domain.MyClass
  {
  }
}
 

Nested Classes

This can be useful for access.


class XXX
{

  private class YYY
  {
  }

  public class ZZZ
  {
  }

  function __constructor()
  {
    $xxx = new YYY();
  }

}


$moo = new XXX::YYY(); // Error
$moo = new XXX::ZZZ(); // OK

Function calling

Far more strict in function calling:

  • Check we have correct number of arguments
  • Allow for variable length arguments (... or keyword?)
  • Type checking

Member overloading

class MyClass
{
 
  function moo()
  {
  }
 
  function moo(int $x)
  {
  }
 
}

Scoping + Typing madness

Scoping needs to be cleaned up, and some peopel want to be able to perform more static typing to aid debugging at compile time.

In Phix mode, we should be able to always require variables to be declared.

  • Even in non-strict mode, variables must always be declared by assignment before being usable. Anything else will fail at compile time:
  // [[StaticTyping(false)]]
  // [[StrictMode(false)]]
  $a = 0; // OK - $a is a dynamic variable
  $b = $c + 2; // ERROR: C has not yet been declared
  • In strict mode, the variable must be explicity declared first, potentially with a type too:
  // [[StaticTyping(false)]]
  // [[StrictMode(true)]]
  var $a = 1; // dynamically typed variable
  int $a = 1; // statically typed - $a is only allows to contain ints
  $a = '45';  // '45' will be implicity converted to int32(45)

Infact, auto might then be nice:

 
  // [[StaticTyping(true)]]
  auto $a = 1; // will make $a an int and needs to stay an int, much like the auto keyword in C++
 
  function moo() { return 1; }
  auto $b = moo(); // moo does not specify an explicit return type, so will fail unless cast first.
 
  function int moo1() { return 1; }
  auto $c = moo1(); // OK, $c is a static int.
 

FIXME: Need to work out how this will work with references,.

The type of the lhs can only be inferred at compile time - so a call to a function without an explicit return type will need to be cast to a type (or the type not auto)

Variable Variables (varvars)

In strict mode, we can no longer use varvars:

  // [[StrictMode(true)]]
  $moo = 'x';
  $x = 5;
  $$moo = 6; // will fail, as we're in strict mode, can't use variable variables!

*Rationale**: Varvar functionality can always be achieved just as well with named arrays, and the advantages of being able to optimize a lot more aggressively far outweigh the advantages of being able to use varvars.

By disallowing varvars, we can infer so much more information from analyzing the call graph.

Strictly Typed

Need to think about this. Perhaps some mode that allows us to ensure we never do implicit casts, and all vars must be types?

Grammar fixes

The following should just not be constructs and instead should be normal function calls:

  • include
  • include_once
  • require
  • require_once

The following should be replaced by new array semantics:

  • array()
  • list()

The following should just be removed for good:

  • elseif (else if does a perfectly good job.
  • print
  • ‘var’ in classes’

Normalize the use of allowed keywords in the language. currently, double and float are the same things - int and integer, etc.

  • ‘exit()’ is current valid - which syntactally is a function call, however ‘exit;’ and ‘exit expr’ also work. Normalize to either be a function or statement with exprssio ntailing, BUT NOT BOTH!!

Backticks for execution are bad, and just there “because perl has it”. It’s a hack and evil.

currently, PHP allows for a colon or a semi-colon after a case/default statement. This should be normalized to only a colon.

switch ($x)
{
  case 1:
  case 0;
}

Case Sensitivity

Make all functions, variables, class names, and operators (and, or), and keywords either case sensitive OR case insensitive. Personally prefer case insensitive and it stops people having 2 classes/variables/functions with different sensitive names!

perhaps make case sensitive for clarity, but error is we use/define with different case.

At the moment, the lexer is case insensitive for keywords.

Operator overloading

Consider implementing to allow users to override operators (==, !=, ===, !=== etc)

Strict language features

One thing that annoys me about PHP (and a lot of other dynamic languages) is the amount of time that is repeatedly spent on doing things that could be done just once.

mmaping bytecode and then JIT’ting or runing through the VM is fairly effecient, however the JIT compilation should be writable so that it’s not done over and over.

In light of this, it would be benificial to be able to convert PHP code directly into opcodes for a particular platform, to run natively without he JIT process and interpreter, having perhaps just libphix_runtime.so linked in or something.

Application Distribution

On the same note as above, it would be nice to have some form of packaging ‘Phix applications’ easily in a single file, pre-bytecode generated.

Obviously this would mean having a stable set of opcodes (which is not the case at the moment :)), and a framework for creating the applications. i liek the idea of turning a complete directory into a Phix application, with a file/function/class method specified as the entry point.

Would need to consider how we overwrite particular files, as well as how we allow writting for files (such as config.php)

Web Framework

Some sort of web controller API for the web server Phix module to communicate with the PHP code would be good, and allows far nicer abstraction of the interface between the web server and the PHP application.

Something like:

interface net.pengus.phix.HTTP.Request
{
  abstract public function onRequest();
}

Need to work out how to handle asynchronous responses, which i’m really keen to support in Phix for large scale AJAX stuff.

Data Input Security Issues

magic quotes is EVIL. The problem is, so many PHP ‘developers’ are complete n00bs, and have absolutly no idea about SQL injection - even when they do, many do not understand the real problems, and how to write secure code.

Because of this, PHP has got a very bad name as being insecure, due to the absoutly huge number of PHP applications that are responsible for servers being hacked.

As much as i don’t personally like ‘hackey’ solutions like magic quotes, i think it’s fairly important that the language can try to provide built in solutions for prevalant security issues.

So i see a few ways of helping entry level users write more secure code:

  • Late binding stringification
  • Tainting
  • Data query language

Late binding stringification

perhaps possible late binding the actual stringification of embedded variables within strings until consumption. For example, this scarily insecure piece of code:

  $sql = "SELECT id FROM moo WHERE id = $_GET['id']";
  /// execute $sql

Could be late bound so that the consumer gets the string with substituted parameters and a parameter list, much like:

 
  /* some internal */ function execute_sql($str)
  {
    echo $str->raw_str, "\n";
    print_r($str->parameters);
    /* Merge the two, and escape any parameters */
  }
 
  // Would output something like:
 
  SELECT id FROM moo WHERE id = $1
  array(
    [0] => '1'
  )
 

Of course, this wouldn’t work, because evil code like this would then break:

  $sql = "SELECT id FROM moo WHERE id = $_GET['id'] ORDER BY $_GET['order'] {array_key_exists('desc', $_GET) ? 'DESC' : 'ASC'}";

... as we don’t know the difference between language constructs and expression/field values.

Only by being able to parse the SQL language could we possibly know whether or not to quote it (or infact provide as a bind parameter).

So further thought is needed here. Either way, the late binding could be useful.

Tainting

By tainting all externally generated data that has not yet been cleaned, we can help ensure that unsanitised strings are not used where they shouldn’t be (system calls, SQL, etc, etc).

Data query language

C# 3.0 provicdes a data query language, that actually looks pretty funky, in a kind of hacky way. It could possibly solve SQL injection security issues though.

Data Output Security Issues

The second batch of security problems come with one of PHP‘s most common usage domains: XSS.

The root of the problem is, in my option, due to how easy it is to slip up in php:

... xxx ...

... OOPS! XSS vunl here.

The tidiest way to output HTML is to use XSL with an applied XML data set, which ensures that data is always escaped as it should be, and also ensures that the output HTML is syntactically correct.

I like the idea of being able to mix an XSL style language with PHP, to ensure that not only is all data correctly escaped, but also that code is either syntactically or semantically correct.

Something like:

 
<?xml version="1.0">
 
<html xmlns="xxx" xmlns:phix="yyy">
  <phix:exec>
    $moo = 'cows';
    $data = array(1,2,3,4,5);
  </phix:exec>
 
  <phix:for-each select="$data">
    <li><phix:value-of select="."/></li
  </phix:for-each>
 
</html>
 

even better owuld be to put the code in one file (model) and the template in another (controller). yes, i knwo this is exactly what smarty is, but smarty just ain’t smart enough imo.

Of course, this would not be built into the language, but instead, a framework to support this (and oany other template or output fixup system) could be added.

The default securiyty framework could then use this along with late bound stringification to try to detect XSS issues.

 
phixes.txt · Last modified: 2007/03/18 06:47 by 80.249.108.13
 
Recent changes RSS feed Creative Commons License Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki