Skip to content
Projects
Groups
Snippets
Help
Loading...
Sign in
Toggle navigation
C
crawler
Project
Project
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
backend
crawler
Commits
32ad8f2c
Commit
32ad8f2c
authored
4 years ago
by
litaolemo
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
update
parent
138b90ca
master
litao
mr/develop/xiaohongshu
soyang
No related merge requests found
Expand all
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
5 additions
and
3 deletions
+5
-3
search_page_multi_process.py
crawler_sys/framework/search_page_multi_process.py
+0
-0
crawler_tudou.py
crawler_sys/site_crawler_by_redis/crawler_tudou.py
+5
-3
No files found.
crawler_sys/framework/search_page_multi_process.py
View file @
32ad8f2c
This diff is collapsed.
Click to expand it.
crawler_sys/site_crawler_by_redis/crawler_tudou.py
View file @
32ad8f2c
...
...
@@ -878,13 +878,15 @@ class Crawler_tudou():
if
__name__
==
'__main__'
:
test
=
Crawler_tudou
()
# url = 'https://video.tudou.com/v/XNDExNjcyNTI0MA==.html'
releaser_url
=
"https://i
d.tudou.com/i/UNTUzMTU1ODg2OA==
"
releaser_url
=
"https://i
.youku.com/i/UNzI3OTI2MTkyOA==/videos?spm=a2hzp.8244740.0.0
"
# ttt = test.video_page("https://video.tudou.com/v/XNDExNjcyNTI0MA==.html")
#releaserUrl=url, output_to_es_raw=True,
# es_index='crawler-data-raw',
# doc_type='doc',
# releaser_page_num_max=100)
test
.
releaser_page_by_time
(
1569081600000
,
1570610953322
,
releaser_url
,
output_to_es_raw
=
True
,
es_index
=
'crawler-data-raw'
,
doc_type
=
'doc'
,
releaser_page_num_max
=
4000
)
sacn_Res
=
test
.
releaser_page_by_time
(
1569081600000
,
1570610953322
,
releaser_url
,
output_to_es_raw
=
True
,
es_index
=
'crawler-data-raw'
,
doc_type
=
'doc'
,
releaser_page_num_max
=
4000
,
allow
=
20
)
for
res
in
sacn_Res
:
print
(
res
)
# test.get_releaser_image(releaser_url)
# test.get_releaser_follower_num()
This diff is collapsed.
Click to expand it.
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment