Skip to content
Projects
Groups
Snippets
Help
Loading...
Sign in
Toggle navigation
S
strategy_spider
Project
Project
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
rank
strategy_spider
Commits
385821df
Commit
385821df
authored
Jan 13, 2020
by
段英荣
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
增加搜索爬取功能
parent
0797736e
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
28 additions
and
25 deletions
+28
-25
zhihu_login.py
zhihu_login.py
+28
-25
No files found.
zhihu_login.py
View file @
385821df
...
...
@@ -249,31 +249,34 @@ class ZhihuAccount(object):
if
"data"
in
raw_content_dict
:
for
data_item
in
raw_content_dict
[
"data"
]:
if
data_item
[
"type"
]
==
"search_result"
:
data_type
=
data_item
[
"object"
][
"type"
]
content
=
data_item
[
"object"
][
"content"
]
platform_id
=
data_item
[
"object"
][
"id"
]
user_id
=
random
.
choice
(
majia_user_list
)
question_id
=
""
if
data_type
==
"article"
:
title
=
data_item
[
"object"
][
"title"
]
elif
data_type
==
"answer"
:
title
=
data_item
[
"object"
][
"question"
][
"name"
]
question_id
=
data_item
[
"object"
][
"question"
][
"id"
]
else
:
print
(
"type is:
%
s"
%
data_type
)
title
=
""
item_dict
=
{
"user_id"
:
user_id
,
"platform_id"
:
platform_id
,
"title"
:
title
,
"content"
:
content
,
"type"
:
data_type
,
"question_id"
:
question_id
}
zhihu_spider_fd
.
write
(
json
.
dumps
(
item_dict
)
+
"
\n
"
)
try
:
data_type
=
data_item
[
"object"
][
"type"
]
content
=
data_item
[
"object"
][
"content"
]
platform_id
=
data_item
[
"object"
][
"id"
]
user_id
=
random
.
choice
(
majia_user_list
)
question_id
=
""
if
data_type
==
"article"
:
title
=
data_item
[
"object"
][
"title"
]
elif
data_type
==
"answer"
:
title
=
data_item
[
"object"
][
"question"
][
"name"
]
question_id
=
data_item
[
"object"
][
"question"
][
"id"
]
else
:
print
(
"type is:
%
s"
%
data_type
)
title
=
""
item_dict
=
{
"user_id"
:
user_id
,
"platform_id"
:
platform_id
,
"title"
:
title
,
"content"
:
content
,
"type"
:
data_type
,
"question_id"
:
question_id
}
zhihu_spider_fd
.
write
(
json
.
dumps
(
item_dict
)
+
"
\n
"
)
except
:
print
(
str
(
data_item
))
time
.
sleep
(
2
)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment