首页 > 问答 > 百度蜘蛛抓取feed文件后,抓取对应页面地址错误 ,从而导致抓

百度蜘蛛抓取feed文件后,抓取对应页面地址错误 ,从而导致抓

[导读]:比如蜘蛛在抓取到 123.125.71.16 - - [25/Aug/2019:01:42:22 +0800] ' GET /2224.html/feed  HTTP/1.1' 200 979 '-' 'Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)' 紧接着就会...

比如蜘蛛在抓取到

123.125.71.16 - - [25/Aug/2019:01:42:22 +0800] 'GET /2224.html/feed HTTP/1.1' 200 979 '-' 'Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)'

紧接着就会抓取 

220.181.108.158 - - [25/Aug/2019:02:18:41 +0800] 'GET /www.whlihun.com/2224.html HTTP/1.1' 404 479 '-' 'Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)'

但因为抓取地址多了www.whlihun.com,从而导致抓取404,这是feed设置出错了吗?

feed内容如下:

<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"

xmlns:content="http://purl.org/rss/1.0/modules/content/"

xmlns:dc="http://purl.org/dc/elements/1.1/"

xmlns:atom="http://www.w3.org/2005/Atom"

xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"

>

<channel>

<title>

《离婚协议是否受合同法约束?》的评论 </title>

<atom:link href="https://www.whlihun.com/2224.html/feed" rel="self" type="application/rss+xml" />

<link>https://www.whlihun.com/2224.html</link>

<description></description>

<lastBuildDate>Wed, 26 Dec 2018 10:06:23 +0000</lastBuildDate>

<sy:updatePeriod>

hourly </sy:updatePeriod>

<sy:updateFrequency>

1 </sy:updateFrequency>

<generator>https://wordpress.org/?v=5.2.2</generator>

</channel>

</rss>

本文来自投稿,不代表微盟圈立场,如若转载,请注明出处:https://www.vm7.com/a/ask/80871.html