Pārlūkot izejas kodu

Correctly handle HTML with no <body> tags.

tags/v0.2
Ben Kurtovic pirms 10 gadiem
vecāks
revīzija
3dde1c5d60
1 mainītis faili ar 4 papildinājumiem un 0 dzēšanām
  1. +4
    -0
      earwigbot/wiki/copyvios/parsers.py

+ 4
- 0
earwigbot/wiki/copyvios/parsers.py Parādīt failu

@@ -136,6 +136,10 @@ class HTMLTextParser(BaseTextParser):
except ValueError:
soup = bs4.BeautifulSoup(self.text).body

if not soup:
# No <body> tag present in HTML ->
# no scrapable content (possibly JS or <frame> magic):
return ""
is_comment = lambda text: isinstance(text, bs4.element.Comment)
for comment in soup.find_all(text=is_comment):
comment.extract()


Notiek ielāde…
Atcelt
Saglabāt