Вопрос

I am experimenting with PHPQuery (https://code.google.com/p/phpquery/) to scrape data from my website. I want to extract meta information from a page.

Here is what I have tried so far :

$html = phpQuery::newDocumentHTML($file, $charset = 'utf-8');

$MetaItems = [];
foreach (pq('meta') as $keys) {
    $names = trim(strtolower(pq($keys)->attr('name')));
    if ($names !== null && $names !== '') {
        array_push($MetaItems, $names);
    }
}
            
for ($i=0; $i < count($MetaItems); $i++) {
    $test = 'meta[name="' . $MetaItems[$i] . '"]';
    echo pq($test)->html();
}

Above : In $MetaItems I get all the meta attribute name.This array is filled correctly.

But selecting and extracting text is not working. How do i get the above code to work? Thanks.

Это было полезно?

Решение

You want an assoc array with name => content, correct? Try this:

$metaItems = array();
foreach(pq('meta') as $meta) {
  $key = pq($meta)->attr('name');
  $value = pq($meta)->attr('content');
  $metaItems[$key] = $value;
}

var_dump($metaItems);

Другие советы

Going under the assumption that the values you are extracting are exactly the same as the values of the name attributes your trying to get... I'm pretty sure the value of the name attribute is case sensitive. You need to remove the strtolower and the trim. Both could be causing issues. I would replace the first part with this:

$html = phpQuery::newDocumentHTML($file, $charset = 'utf-8');

$MetaItems = [];
foreach (pq('meta') as $keys) {
    $names = pq($keys)->attr('name');
    if (!empty($names) && trim($names)) {
        array_push($MetaItems, $names);
    }
}

hope that helps

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top