在线观看不卡亚洲电影_亚洲妓女99综合网_91青青青亚洲娱乐在线观看_日韩无码高清综合久久

鍍金池/ 問答/Python/ 為什么我使用pyspider框架進(jìn)行爬蟲,但是results里沒有結(jié)果?

為什么我使用pyspider框架進(jìn)行爬蟲,但是results里沒有結(jié)果?

我用 pyspider 想爬取 51job 上的招聘信息,在控制臺(tái)代碼頁 run 驗(yàn)證的時(shí)候輸出是正確的,但是回到控制臺(tái) run 之后 results 里面就沒有結(jié)果,這樣的情況一直出現(xiàn),麻煩各位幫我看一下。

代碼:

from pyspider.libs.base_handler import *

class Handler(BaseHandler):
    crawl_config = {
    }

    @every(minutes=24 * 60)
    def on_start(self):
        self.crawl('http://jobs.51job.com/', callback=self.index_page, validate_cert=False, age=0)

    @config(age=10 * 24 * 60 * 60)
    def index_page(self, response):
        for each in response.doc('.e5 .lkst a').items():
            self.crawl(each.attr.href, callback=self.detail_page, validate_cert=False, age=0)

    @config(priority=2)
    def detail_page(self, response):
        for each in response.doc('.e .info .title a').items():
            self.crawl(each.attr.href, callback=self.detail_page_next, validate_cert=False, age=0,retries=3)
        for each in response.doc('.bk a').items():
            print "deep"
        self.crawl(each.attr.href, callback=self.detail_page, validate_cert=False, age=0)
                
    
    @config(priority=1)
    def detail_page_next(self, response):
        return {
            "公司":response.doc('.cname').text(),
            "公司規(guī)模":response.doc('.ltype').text(),
            "職位":response.doc('h1').text(),
            "薪資":response.doc('.cn strong').text(),
            "描述":response.doc('.job_msg').text(),
            "地點(diǎn)":response.doc('.lname').text(),
        }

代碼頁驗(yàn)證正確:
圖片描述

控制臺(tái):
圖片描述

results:
圖片描述

回答
編輯回答
執(zhí)念

試試下面的腳本,設(shè)置detail_page為priority=2會(huì)讓結(jié)果更早出現(xiàn)

#!/usr/bin/env python
# -*- encoding: utf-8 -*-
# Created on 2018-01-22 12:13:12
# Project: 51job


from pyspider.libs.base_handler import *

class Handler(BaseHandler):
    crawl_config = {
    }

    @every(minutes=24 * 60)
    def on_start(self):
        self.crawl('http://jobs.51job.com/', callback=self.main_index, validate_cert=False, age=0)

    @config(age=10 * 24 * 60 * 60)
    def main_index(self, response):
        for each in response.doc('.e5 .lkst a').items():
            self.crawl(each.attr.href, callback=self.index_page, validate_cert=False, age=0)

    @config(priority=1)
    def index_page(self, response):
        for each in response.doc('.e .info .title a').items():
            self.crawl(each.attr.href, callback=self.detail_page, validate_cert=False, age=0,retries=3)
        for each in response.doc('.bk a').items():
            print "deep"
        self.crawl(each.attr.href, callback=self.index_page, validate_cert=False, age=0)
                
    
    @config(priority=2)
    def detail_page(self, response):
        return {
            "公司":response.doc('.cname').text(),
            "公司規(guī)模":response.doc('.ltype').text(),
            "職位":response.doc('h1').text(),
            "薪資":response.doc('.cn strong').text(),
            "描述":response.doc('.job_msg').text(),
            "地點(diǎn)":response.doc('.lname').text(),
        }
2017年10月6日 05:04