You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
Brett Langdon d1b25d0762
Docs are good
9 years ago
docs Docs are good 9 years ago
soup_schema Docs are good 9 years ago
.gitignore Docs are good 9 years ago
MANIFEST.in Initial prototype 9 years ago
README.rst Docs are good 9 years ago
requirements.txt Initial prototype 9 years ago
setup.py Initial prototype 9 years ago

README.rst

soup-schema
===========

Define schemas for parsing HTML with BeautifulSoup4_.

.. _BeautifulSoup4: https://www.crummy.com/software/BeautifulSoup/

Installing
----------

.. code:: bash

pip install soup_schema


Example usage
-------------

.. code:: python

from soup_schema import Schema, Selector, AttrSelector

class PageSchema(Schema):
content = Selector('#content', required=True)
description = Selector('[name=description]')
stylesheets = AttrSelector('[rel=stylesheet]', 'href', as_list=True)
title = Selector('title', required=True)


html = """
<html>
<head>
<title>My page title</title>
<link rel="stylesheet" href="/dist/css/third-party.css" />
<link rel="stylesheet" href="/dist/css/style.css" />
<meta name="description" content="This is my page description" />
</head>
<body>
<div id="content">
<p>This is my page content</p>
</div>
</body>
</html>
"""

page = PageSchema.parse(html)
print(page)
# PageSchema(
# content='\nThis is my page content\n',
# description='This is my page description',
# stylesheets=['/dist/css/third-party.css', '/dist/css/style.css'],
# title='My page title'
# )