[2018-11-29] 1 crawler/crawler_sys/framework/scrap_list_page_async.py 1.1 lst_page_conf.ini 转入crawler/crawler_sys/config/sites/list_page_urls.ini; 1.2 list_page_urls.ini 中每个site一个[section] header,所有的site名称要与crawler/crawler_sys/framework/platform_crawler_register.py保持一致; 1.3 args.platform default='' (现在是'腾讯视频'), 在parse arg的时候判断,如果platform参数=='' 直接退出; 1.4 args.platform 如果不为空,判断是否在latform_crawler_register.py里,如果不是,程序退出。 2 名称规范,包括文件名和函数名(最低优先级,可以最后有时间再改) lst_page -> list_page [2018-12-25] 1 for releaser_page crawler, the name of function must be releaser_page so that we can import the function in framework 2 for the releaser_page function, the input variable is releaserUrl, other functions such as get_releaser_id and get_releaser_uk must be included in this function 3 es_index and doc_type must be given so that we can reduce some if/else in output process. At the beginning, if es_index is None, the es_index is default to crawler-data-raw