Skip to content
Projects
Groups
Snippets
Help
Loading...
Sign in
Toggle navigation
C
crawler
Project
Project
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Chengyang Zhong
crawler
Commits
f39d7a70
Commit
f39d7a70
authored
5 years ago
by
litaolemo
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
update
parent
ebd80175
master
xiangwan
No related merge requests found
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
7 additions
and
2 deletions
+7
-2
push_crawler_data_to_mysql.py
crawler_sys/scheduler/push_crawler_data_to_mysql.py
+7
-2
No files found.
crawler_sys/scheduler/push_crawler_data_to_mysql.py
View file @
f39d7a70
...
@@ -29,8 +29,8 @@ def send_email(query_id_dict: Dict):
...
@@ -29,8 +29,8 @@ def send_email(query_id_dict: Dict):
新的query:{search_keyword}抓取内容需要审核,帖子号为
\n
新的query:{search_keyword}抓取内容需要审核,帖子号为
\n
"""
.
format
(
search_keyword
=
search_keyword
,
)
"""
.
format
(
search_keyword
=
search_keyword
,
)
for
tractate_id
in
query_id_dict
[
search_keyword
]:
for
tractate_id
in
query_id_dict
[
search_keyword
]:
body_str
+=
tractate_id
+
", "
body_str
+=
str
(
tractate_id
)
+
", "
print
(
"line25"
,
tractate_id
)
print
(
"line25"
,
str
(
tractate_id
)
)
send_file_email
(
""
,
""
,
send_file_email
(
""
,
""
,
email_group
=
[
"<hongxu@igengmei.com>"
,
"<yangjiayue@igengmei.com>"
,
email_group
=
[
"<hongxu@igengmei.com>"
,
"<yangjiayue@igengmei.com>"
,
...
@@ -54,6 +54,7 @@ def scan_es_to_mysql():
...
@@ -54,6 +54,7 @@ def scan_es_to_mysql():
}
}
}
}
}
}
count
=
0
scan_res
=
scan
(
client
=
es_framework
,
query
=
search_query
,
index
=
"crawler-data-raw"
)
scan_res
=
scan
(
client
=
es_framework
,
query
=
search_query
,
index
=
"crawler-data-raw"
)
for
res
in
scan_res
:
for
res
in
scan_res
:
if_exists
=
rds
.
sismember
(
"article_id_list"
,
res
[
"_id"
])
if_exists
=
rds
.
sismember
(
"article_id_list"
,
res
[
"_id"
])
...
@@ -70,6 +71,10 @@ def scan_es_to_mysql():
...
@@ -70,6 +71,10 @@ def scan_es_to_mysql():
rds
.
sadd
(
"article_id_list"
,
res
[
"_id"
])
rds
.
sadd
(
"article_id_list"
,
res
[
"_id"
])
search_word
=
data
[
"search_word"
]
search_word
=
data
[
"search_word"
]
query_id_dict
[
search_word
][
tractate_id
]
=
1
query_id_dict
[
search_word
][
tractate_id
]
=
1
count
+=
1
if
count
%
1000
==
0
:
send_email
(
query_id_dict
)
query_id_dict
=
{}
send_email
(
query_id_dict
)
send_email
(
query_id_dict
)
...
...
This diff is collapsed.
Click to expand it.
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment