用 phpQuery 配合 file_get_contents()和 curl 获取网易新闻标题都不成功，想问下大佬们怎么解决

V2EX = way to explore

V2EX 是一个关于分享和探索的地方

已注册用户请登录

这是一个创建于 1892 天前的主题，其中的信息可能已经有所发展或是发生改变。

<?php

require('phpQuery/phpQuery.php');

$link = "http://news.163.com/19/0131/15/E6S1OSOL000189DH.html";

/*function curl_get($url, $gzip=false){
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 10);
if($gzip) curl_setopt($curl, CURLOPT_ENCODING, "gzip"); // 关键在这里
$content = curl_exec($curl);
curl_close($curl);
return $content;
}

$content = curl_get($link,$gzip=false);*/

$content = file_get_contents($link);

$content = iconv("gb2312", "utf-8//IGNORE", $content);

phpQuery::newDocumentFile($content); //以 html 内容的方式进行初始化

$title = pq(".post_content_main h1")->text();

var_dump($title);
?>

获取之后，显示如下
Warning: file_get_contents(<!DOCTYPE HTML>      <html id="ne_wrap">  <head> <title>网易哒哒：用更短的时间，带你看更酷的世界_网易新闻</title> <base target="_blank"/> <meta http-equiv="expires" content="0"/> <meta http-equiv="Cache-Control" content="no-transform"/> <meta http-equiv="Cache-Control" cont in D:\xin\phpStudy\PHPTutorial\www\php_learn\phpquery\phpQuery\phpQuery.php on line 408
string(0) ""

我只是想要网易哒哒：用更短的时间，带你看更酷的世界这个文章标题
非常感谢

7 条回复 • 2019-06-28 09:32:04 +08:00

77sec

2019-02-12 22:11:38 +08:00

这是什么情况百度搜不到

KasuganoSoras

2019-02-12 22:12:50 +08:00

不知道你想干啥
把网页内容当成网址去 file_get_contents 是什么操作

GDC

2019-02-12 22:48:31 +08:00

@KasuganoSoras 他 curl 那段是注释掉的

580a388da131

2019-02-12 23:17:38 +08:00 via iPhone

newDocumentFile($link)

580a388da131

2019-02-12 23:29:47 +08:00 via iPhone

哦忘了
pg("title")

pinerge

2019-03-02 09:41:51 +08:00

<?php

$res = curl('http://news.163.com/19/0131/15/E6S1OSOL000189DH.html');

echo $res;

function curl($url)
{
$ch = curl_init();
$options = [
CURLOPT_URL => $url,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_AUTOREFERER => true,
CURLOPT_FOLLOWLOCATION => true,
// CURLOPT_SSL_VERIFYHOST => 0,
// CURLOPT_SSL_VERIFYPEER => false,
// CURLOPT_TIMEOUT_MS => 20000,
// CURLOPT_REFERER => 'https://alpha.wallhaven.cc/wallpaper/' . $id,
// CURLOPT_HTTPHEADER => [
// 'user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.109 Safari/537.36',
// ],
];
curl_setopt_array($ch, $options);
$res = curl_exec($ch);
curl_close($ch);
preg_match('#<title>([\W\w]+)_网易新闻</title>#', iconv('gbk', 'utf-8', $res), $matches);
return $matches[1];
}

xpxw

2019-06-28 09:32:04 +08:00

问题在于缺少关键 header 吧，像百度和 QQ 一样，你用空头去访问，是没有内容的。

用 phpQuery 配合 file_get_contents()和 curl 获取网易新闻标题 都不成功，想问下大佬们怎么解决

用 phpQuery 配合 file_get_contents()和 curl 获取网易新闻标题都不成功，想问下大佬们怎么解决