get traffic caused by readfile()

https://stackoverflow.com/questions/5128052

02-01-2021
|

题

I use readfile to let the client download a file through my server. Therefor I output the data which I receive from readfile('external-url') directly to the client.

Now I want to determine the traffic which is caused by readfile().

I can determine it by the return value of readfile but only if the client finishs the download. Otherwise the script stops working and the return value of readfile() is 0.

First I tried this code:

//outputs download headers
//creating $stream_context with request headers for the external download server
$traffic = readfile($url, false, $stream_context);
//save traffic...

Save traffic was never called when the client stopt downloading.

Then I registered a shutdown-function with register_shutdown_function() which included $traffic as global variable to save the traffic. Now the traffic-file was created but the used traffic was 0.

I don't have access to the server logs or something else. I only can use php and htaccess.

One workaround which I now use is that I start a request to the file, parse the filesize and add the complete filesize to the client traffic. Then I start the download with readfile(). If the client stops downloading it is handled like he would have downloaded the whole file.

A third method could be curl and its CURLOPT_WRITEFUNCTION-settings. But this is too much overhead for the server and bears no relation to the thing I want to do: save the real traffic.

There is also another problem with saving the client traffic before downloading the file: I want to support resuming and chunked downloads (multiple connections to one file for faster download). This still works, but the problem is counting the traffic! For chunks I can parse the HTTP-RANGE header to determine the requested file parts and save this as traffic, but what about resuming?

So is there a possible solution out in the world?

I still don't use a database, I only use a file with htaccess -logininformation to identify the clients and save the used traffic for each client in a separate file on my webspace.

Here is also my code:

//$download = array(url, filesize, filename) got it whith a separate curl request to the external file
$downloadHeader = CreateDownloadHeaders($download, $_hoster->AcceptRanges());

$requestOptions = array(
    'http'=>array(
        'method' => 'GET',
        'header' => CreateRequestHeaders($download['filesize'], $_hoster->AcceptRanges())
    )
);

$requestOptions['http']['header'] = array_merge($requestOptions['http']['header'], $_hoster->GetAdditionalHeaders());

//Output download headers for our client
foreach($downloadHeader as $header) {
    header($header);
}


register_shutdown_function('SaveTraffic', $username, $givenUrl, $download['filename'], $download['filesize']);
//SaveTraffic($username, $givenUrl, $download['filename'], $download['filesize']);

$context = stream_context_create($requestOptions);
$traffic = readfile($download['url'], false, $context);

And now the functions:

function CreateDownloadHeaders($download, $acceptRanges) {
    //IE workaround for downloads
    $type = (isset($_SERVER['HTTP_USER_AGENT']) && strpos($_SERVER['HTTP_USER_AGENT'],'MSIE')) ? 'force-download' : 'octet-stream';

    $headers = array(
        'Content-Type: application/' . $type,
        'Content-Disposition: attachment; filename="'.$download['filename'].'"',
        'Content-Length: '.$download['filesize'],
        'Content-Transfer-Encoding: Binary',
        'Expires: 0',
        'Cache-Control: must-revalidate, post-check=0, pre-check=0',
        'Pragma: public',
        'Connection: close'
    );

    $headers = AddDownloadRangeHeaders($headers, $acceptRanges, $download['filesize']);

    return $headers;
}


function CreateRequestHeaders($filesize, $acceptRanges) {
    $headers = array();

    $headers = AddRequestRangeHeaders($headers, $acceptRanges, $filesize);
    $headers[] = 'User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; de; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13';
    $headers[] = 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8';
    $headers[] = 'Accept-Language: de, en-gb;q=0.9, en;q=0.8';
    $headers[] = 'Accept-Encoding: gzip, deflate';
    $headers[] = 'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7';
    $headers[] = 'Cache-Control: no-cache';
    $headers[] = 'Pragma: no-cache';
    $headers[] = 'Connection: close';

    return $headers;
}


function AddDownloadRangeHeaders($headers, $acceptRanges, $filesize) {
    if($acceptRanges !== true) {
        $headers[] = 'Accept-Ranges: none';
    }
    elseif(isset($_SERVER['HTTP_RANGE'])) {
        preg_match('/bytes([[:space:]])?=([[:space:]])?(\d+)?-(\d+)?/', $_SERVER['HTTP_RANGE'], $matches);

        $start = intval($matches[3]);
        $stop = intval($matches[4]);

        if($stop == 0) {
            $stop = $filesize;
        }

        $headers[] = 'HTTP/1.1 206 Partial Content';
        $headers[] = 'Accept-Ranges: bytes';
        $headers[] = 'Content-Range: bytes ' . $start . '-' . $stop . '/' . $filesize;

        $newSize = $stop - $start + 1;
        $key = array_search('Content-Length: '.$filesize, $headers);
        $headers[$key] = 'Content-Length: '.$newSize;
    }

    return $headers;
}


function AddRequestRangeHeaders($headers, $acceptRanges, $filesize) {
    if($acceptRanges === true && isset($_SERVER['HTTP_RANGE'])) {
        preg_match('/bytes([[:space:]])?=([[:space:]])?(\d+)?-(\d+)?/', $_SERVER['HTTP_RANGE'], $matches);

        $start = intval($matches[3]);
        $stop = intval($matches[4]);

        if($stop == 0) {
            $stop = $filesize;
        }

        $headers[] = 'Range: bytes='.$start.'-'.$stop;
    }

    return $headers;
}

解决方案

I was thinking about how curl has implemented the save-to-filestream function. I realized that it has to be something like a special CURLOPT_WRITEFUNCTION, because stopping the script while saving to filestream leaves a file on my webspace which contains the already loaded part.

Therefor I tried it with CURLOPT_WRITEFUNCTION and it seems to be not as resource intensive as I thought.

Now I use register_shutdown_function to call a function which saves the used traffic. My CURLOPT_WRITEFUNCTION counts the loaded data = traffic.

It's also important that you store the current working directory in a variable if you want to save the traffic in a file. Because every relative path called in a registered shutdown function isn't relative to your root-directory, it's relative to the server root directory! You can also use absolut paths instead of cwd.

function readResponse($ch, $data) {
    global $traffic;

    $length = mb_strlen($data, '8bit');

    //count traffic
    $traffic += $length;
    //output loaded data
    echo $data;

    return $length;
}

function saveTraffic($username, $cwd) {
    global $traffic;

    $h = fopen($cwd.'/relative-path/traffic_'.$username.'.txt', 'ab');
    fwrite($h, $traffic);
    fclose($h);
}

//...

$cwd = getcwd();
register_shutdown_function('saveTraffic', $username, $cwd);

$traffic = 0;

//...
//Output download header information to client
//...

curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_WRITEFUNCTION, 'readResponse');

//...

curl_exec($ch);

Thank you for all your help! It was very useful!

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow