Beautiful Soup
Details
| Size: | 69K |
| Last Update: | 2008-05-02 00:13:46 |
| Version: | 3.0.3 |
| OS Support: | Linux |
| License/Program Type: | Python License |
| Publisher: | Leonard Richardson |
| Price: | $0.00 |
Description:
Beautiful Soup 3.0.3 is markup software developed by Leonard Richardson.
Beautiful Soup project is a Python HTML/XML parser designed for quick turnaround projects like screen-scraping. Three features make it powerful:
Beautiful Soup won't choke if you give it bad markup. It yields a parse tree that makes approximately as much sense as your original document. This is usually good enough to collect the data you need and run away.
Beautiful Soup provides a few simple methods and Pythonic idioms for navigating, searching, and modifying a parse tree: a toolkit for dissecting a document and extracting what you need. You don't have to create a custom parser for each application.
Beautiful Soup automatically converts incoming documents to Unicode and outgoing documents to UTF-8. You don't have to think about encodings, unless the document doesn't specify an encoding and Beautiful Soup can't autodetect one. Then you just have to specify the original encoding.
Beautiful Soup parses anything you give it, and does the tree traversal stuff for you. You can tell it "Find all the links", or "Find all the links of class externalLink", or "Find all the links whose urls match "foo.com", or "Find the table heading that's got bold text, then give me that text."
Valuable data that was once locked up in poorly-designed websites is now within your reach. Projects that would have taken hours take only minutes with Beautiful Soup.
Requirements:
Python
What's New in This Release:
Beautiful Soup can now convert invalid HTML or XML into something approaching XHTML or valid XML.
Beautiful Soup 3.0.3 supports english interface languages and works with Linux.
Downloading Beautiful Soup 3.0.3 will take several seconds if you use fast ADSL connection.
0 comments
Add to
Beautiful Soup Version History
Related Software
|
|
From category: Markup |
| dvipng 1.9 is markup software developed by Jan-?ke Larsson. This program makes PNG and/or GIF graphics from DVI files as obtained from TeX and its relatives. If GIF support is enabled, GIF o... |
|
|
From category: Markup |
| eq2png 0.01 is markup software developed by J. Scott Olsson. eq2png is a simple Perl script to make it painless to produce Portable Network Graphic (PNG) images for OpenOffice Impress presentations... |
|
|
From category: Others |
| pyhtmlhelp is a cross-platform tool written in Python for converting among CHM, HTB, and DevHelp formats.... |
|
|
From category: Others |
| aspell-fo 0.2.25 is others software developed by Jacob Sparre Andersen. aspell-fo is a Faroese dictionary for aspell based on Foroyski oroalistin til r?ttlestur (The Faroese Spellchecking Dictionar... |
|
|
From category: Others |
| bdf2psf is a font converter that lets you use any of the Adobe BDF fonts that are bundled with X Windows on the Linux console.... |
|
|
From category: Markup |
| Prince is a batch formatter for converting XML into PDF by applying Cascading Style Sheets (CSS).... |
|
|
From category: Markup |
| HTML2fo is a HTML to XSL:FO converter.... |
|
|
From category: Markup |
| DoceboLMS 3.0.3 is markup software developed by Docebo SRL. DodeboLMS (previously \"Spaghettilearning\") is an e-learning platform for distance learning. DoceboLMS project supports SCORM 1.2, the e... |
|
|
From category: Markup |
| Morla is an editor of RDF documents, written in C for the GNU/Linux operating system.... |
|
|
From category: IDEs |
| Eclipse 3.2 is ides software developed by Eclipse Foundation. Eclipse project is an open source community whose projects are focused on providing an extensible development platform and application... |
|
|
From category: Others |
| Faroese Spell Checking Dictionary 0.2.26 is others software developed by Jacob Sparre Andersen. Faroese Spell Checking Dictionary is intended to be used with programs like aspell and ispell. \... |
|
|
From category: Others |
| Dictorg 0.7 is others software developed by Kristian Gunstone. Dictorg is a basic dict.org parser script for console use. The project is written in perl and uses Curl::easy to access dict.org. \... |
|
|
From category: Others |
| TEA text editor is a modest and easy-to-use editor with many useful features for HTML editing.... |
|
|
From category: Emacs |
| Emacs Common Lisp 20061030 is emacs software developed by Lars Brinkhoff. Emacs Common Lisp is an implementation of Common Lisp, written in Emacs Lisp. It does not yet purport to conform to the ANS... |
|
|
From category: Others |
| bbe 0.2.2 is others software developed by Timo Savinen. bbe is a sed-like editor for binary files. bbe performs basic byte operations on blocks of input stream. bbe is command line tools dev... |
Leave a comment