Download ASPseek 1 2 10 - ASPseek 1 2 10 Description, ASPseek 1 2 10 Reviews
Contact
 


 

Download

 
Download Now (1105K)
GPL (GNU General Public License)
Downloads till now: 9
 
 

Quick search

 



 

Rate this software

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

No. Votes

0

 

Linux

Browsers & WWW, Browsers and WWW, Chat, DNS , eyeOS Apps , Firefox Extensions , FTP, HTTP (WWW) , Log Analyzers , Other Tools, Proxy , Thunderbird Extensions ,

Windows

Mac

Mobile

Drivers

Scripts - DHTML

Scripts - DHTML (new)

Web Developer Blog

Web Developer Blog (new)

Scripts and Applications

Ajax
ASP
ASP.NET
C and C++
CFML
CGI and Perl
Flash
Java
JavaScript
PHP
Python
XML

ASPseek 1.2.10

 

Details

Size: 1105K
Last Update: 2008-04-22 11:07:20
OS Support: Linux
License/Program Type: GPL (GNU General Public License)
Publisher: SWsoft
Price:$0.00
Description:

ASPseek 1.2.10 is http (www) software developed by SWsoft.
ASPseek is an Internet search engine software developed by SWsoft and licensed as free software under GNU GPL.



ASPseek consists of an indexing robot, a search daemon, and a CGI search frontend. It can index as many as a few million URLs and search for words and phrases, use wildcards, and do a Boolean search. Search results can be limited to time period given, site or Web space (set of sites) and sorted by relevance (PageRank is used) or date.

ASPseek is optimized for multiple sites (threaded index, async DNS lookups, grouping results by site, Web spaces), but can be used for searching one site as well. ASPseek can work with multiple languages/encodings at once (including multibyte encodings such as Chinese) due to Unicode storage mode. Other features include stopwords and ispell support, a charset and language guesser, HTML templates for search results, excerpts, and query words highlighting.

ASPseek is written in C++ using the STL library, and uses mix of SQL database and binary files for storage.

Here are some key features of "ASPseek":
Ability to index and search through several millions of documents

Using ASPseek, you can build a database and search through many sites, and results for each query will be returned fast even if you have a few millions of documents indexed. Of course, this depends on hardware, so don't expect "good old" i486 machine to handle every site in .com domain. Everything depends on CPU(s), memory, disk speed etc. So do your own tests before you buy dedicated hardware.

The fact that ASPseek is optimized for high volumes should not stop you from using it to search your own site that contains few hundred of documents - it works there as well.

Very good relevancy of results

The purpose of search engine is to find what user wants. There can be thousands of URLs found as a result of search query, but it can all be irrelevant, so user will be unsatisfied.

Output results in ASPseek are sorted by relevancy (or rank), but rank calculation is not an easy task. Developers tried their best to incorporate greatest and latest techniques into ASPseek engine while maintaining good search speed.

Ispell support

When ASPseek is used with ispell support, searchd(1) can optionally find all forms for all specified words (example: create --> create OR created OR creates). So, it allows you to find the word in all of different forms.

Unicode storage mode

ASPseek can store information about documents in Unicode, thus making possible to implement a multi-language search engine. So, you can index and search the documents in English, Russian and even Chinese, all in one database.

HTTP, HTTPS, HTTP proxy, FTP (via proxy) protocols

As ASPseek is a Web search engine, it uses HTTP protocol to index sites. ASPseek also supports secure https:// protocol. FTP protocol is not supported directly, but you can use proxy (like squid) and index FTP sites via proxy.

ASPseek supports "basic authorization" feature of HTTP so you can index password-protected areas (for example private information in your intranet).

Text/html and text/plain document types support

ASPseek can understand documents written in HTML, and plain text documents. These are the most popular formats in Internet.

Other formats, such as PDF, RTF, etc, can be supported with the help of any external program/script which is able to convert that formats to HTML or plain text.

Multithreaded design, async DNS resolver etc

ASPseek uses POSIX threads, that means that one process have many threads running in parallel. So index downloads documents from many sites, and search daemon processes many search queries simultaneously. This not only helps ASPseek to scale well on SMP (multiprocessor) systems, but also improves indexing speed, because in case of one thread most time will be spent on waiting for data from network.

One thing that slow indexing process down a lot is DNS lookup (a process of determining IP address using server name). To avoid delays, asynchronous lookups (lookup is done by separate dedicated processes) and IP address cache are implemented.

Stopwords

Stopwords are a words that have no meaning by itself. Examples: is, are, at, this. Searching for at is useless, so such words are excluded from search query. Stopwords are also excluded from database during indexing, so database becomes smaller and faster.

There is no "built-in" stopwords in ASPseek, they are loaded during start-up from files. Many stopword files for different languages comes with ASPseek.

Charset guesse

Some broken or misconfigured servers don't tell clients the charset in which they provide content. If you are indexing such servers, or using ASPseek to index ftp servers (FTP protocol does know nothing about charsets), charset guesser can be used to deal with it. Charset guesser uses word frequency tables (called langmaps) to determine correct charset.

Robot exclusion standard (robots.txt) support

ASPseek fully supports this standard. It is intended for web site authors for telling the robot (for example, ASPseek's index(1)) to skip indexing some directories of their sites.

For more information see http://www.robotstxt.org/wc/robots.html

Settings to control network bandwidth usage and Web servers load

You can precisely control network bandwidth that index(1) uses. Exactly, you can limit the bandwidth (expressed in bytes per second) used by index(1) for given time-of-day. For example, you can limit the bandwidth during business hours so people at your office will not experience slow Internet.

You can also set the minimum time between two queries to the same Web server, so it will not be overloaded and got down to its knees while you run index(1).

Real-time asynchronous indexing

Some search engines requires that search should be stopped for the time of database update. ASPseek does not need it, so you can search non-stop.

More to say, there is a special mode of indexing called "real-time" indexing. You can use it for small number of documents, and as far as such document is downloaded and processed, changes are immediately visible in search interface. This feature is a great help if you are building search engine for pages with rapidly-changing content such as online news etc.

Note that number of documents in "real-time" database is limited. It's about 1000 on our hardware (your mileage may vary), and the more documents you have in "real-time" database, the slower will be speed of indexing into that (and only that) database. This will not affect search speed though.

Documents from "real-time" database are moved to normal database after running index(1) in a normal way.

Sorting results by relevance or by date

Search engines usually returns most relevant results first. But if you are looking for latest pages, you can tell ASPseek to sort results by last modification date, so recently modified (or created) pages will be displayed first.

Excerpts, query words highlighting

Excerpt is a piece of found document with words searched for highlighted, just to give an idea of what the document is about. You can customize the number of excerpts displaying and their length. If you will disable excerpts, the beginning of document will be displayed.

Every found document is accompanied with the "Cached" link. ASPseek keeps a local compressed copy of every document processed, so user can see the the whole document with (optional) highlighted words that were searched for, even if it has been removed from original site (that happens sometimes).

Grouping results by site

Results from one site can be grouped together. If grouping by sites is on, only two results are displayed from the same site by default, and user can see other pages from the same site by following a "More results from ..." link.

Clones

Clones are identical documents at different locations. They are detected and grouped together, so user will not be presented with a page full of URLs to the identical documents.

Clone detection is usually limited by one site (so identical documents from different sites are not counted as clones), but you can change this by recompiling ASPseek with --disable-clones-by-site option.

Spaces and subsets

Space is the set of sites. So, if you want to provide the search narrowed to some area, you can create a space and search within that space. Only whole sites (e.g. http://www.mysite.com/) are allowed to be included in space.

Subsets can also be used to restrict the search. You can create subset and put URL mask (like http://www.mysite.com/mydir/%) into that, and then limit search scope to only given subset.

You can restrict search scope to not only one but several subsets or spaces.

HTML templates for easy-to-customize search results

You can customize your search pages, so they will look like and be seamlessly integrated with the rest of your site. This is done by simple editing of search template file.


Installation

gzip -dc aspseek-1.2.10.tar.gz | tar xf -
cd aspseek-1.2.10./configure
make
su
make install
ASPseek 1.2.10 supports different languages (including english). It works with Linux.

Downloading ASPseek 1.2.10 will take minute if you use fast ADSL connection.

Leave a comment




(optional)

What is 7-3?




0 comments


Add to

 Del.icio.us   Digg It   Furl   YahooMyWeb   Blinklist
 

ASPseek 1.2.10 Version History

Product Date Added
ASPseek 1.2.10 2008-04-22 11:07:20


Related Software

Free Statistics
From category: HTTP-WWW
Free Statistics 1.1.0 is http (www) software developed by Free-Webhosts.com. Free Statistics records and views daily Web site page views (hits) for statistical tracking. This is a Free PHP script t...
Aphid 0.19a
From category: HTTP-WWW
Aphid 0.19a is http (www) software developed by eScout Corporation. Aphid is a script for quickly compiling and installing the Apache Web server with mod_ssl. Aphid provides a facility for bootstr...
DownThemAll! for Thunderbird
From category: Thunderbird-Extensions
DownThemAll! for Thunderbird 0.9.9.4 is thunderbird extensions software developed by The dTa Team. DownThemAll lets you download all the links or images contained in a webpage and much more. You ca...
Propeller News
From category: Firefox-Extensions
Propeller News is a Firefox extension that displays the latest news stories from Propeller.com in a top-level News menu and side...
FlashGot Firefox
From category: Firefox-Extensions
FlashGot - download one link, selected links or all the links of a page at the maximum speed with a single click....
Save As Image
From category: Firefox-Extensions
Save As Image is a Firefox extension which adds the ability to save a page, frame, or part of either as an image....
French translation for eyeOS
From category: eyeOS-Apps
French translation for eyeOS package offers an eyeOS translation in French language....
Cobra
From category: HTTP-WWW
Cobra 0.96 is http (www) software developed by Jose. Cobra HTML Toolkit is an open source library that provides a pure Java HTML parser and a renderer. Cobra is intended to support HTML 4, Javascri...
AXIGEN Mail Server
From category: Other-Tools
AXIGEN is a fast, reliable and secure mail server on Linux, with an open architecture, designed for easy management allows full control of the email traffic from central WEB /CLI Admin module.Providin...
FAlbum
From category: HTTP-WWW
FAlbum 0.6.7 is http (www) software developed by Elijah Cornell. FAlbum is a WordPress plugin that allows you to display your Flickr photos and photosets on your site. WordPress 1.5 / 2.0 -...
delicious python
From category: HTTP-WWW
delicious python 0.3.3 is http (www) software developed by regenkinder. delicious python lets you access the Web service of del.icio.us via its API through Python. It uses only the standard Python...
Bauk HTTP server
From category: HTTP-WWW
Bauk HTTP server 2.0.3 is http (www) software developed by Vlajko. Bauk HTTP server project is a high-performance Web server. Bauk\'s advanced architecture and unique design provide high performanc...
filofant
From category: HTTP-WWW
filofant project is a mail archiving and document indexing software....
Outlook LAN Messenger Awarded Software
From category: Chat
Instant messaging software designed for use within Small and Corporate Office\'s Local Area Network. Features include Text & Group Chat, Voice Chat, File Transfer, Offline Messaging. Its Server-less,...
AMP 0.9.1 Beta
From category: HTTP-WWW
AMP 0.9.1 Beta is http (www) software developed by Matt Pileggi. AMP is a framework-independant Java API that enables processing of AJAX requests in a normal MVC pattern. AMP framework is cu...
 

Top Downloads

 
1. Canon PIXMA iP1000 Printer Driver
2. Canon PIXMA iP1200 Printer Driver
3. Realtek ALC/ 262/ 265/ 268/ 660/ 861/ 880/ 882/ 883/ 885/ 888 Audio
4. Canon PIXMA iP1300 Printer Driver a
5. Canon PIXMA MP210 MP Drivers
6. Canon PIXMA iP1300 Printer Driver c
7. Canon i-SENSYS LBP2900 Printer Driver R
8. Asus EZVcr II
9. Canon PIXMA MP160 MP Drivers xp64
10. Genius Eye 110 Webcam Driver
11. Canon PIXMA MP140 MP Drivers
12. JavaScript Page Preloader
13. LG GSA-2166D
14. Canon PIXMA MP220 MP Drivers
15. Canon PIXMA iP1600 Printer Driver
16. Canon PIXMA iP1200 Printer Driver x64 d
17. Matsushita DVD-RAM UJ-842S
18. Realtek RTL8139C(L)+/RTL8139D(L)/RTL8100(L)/RTL8130/RTL8139B(L) Driver
19. Canon PIXMA MP150 MP Drivers 2kxp
20. Canon LaserShot LBP-1210 Printer Driver

DownloadTube Editor Reviews

 
1. Opera Browser
Opera lets you synchronize every part of your life. Synch yo...
2. Aplus Video to iPod PSP 3GP Converter
Aplus Video to iPod PSP 3GP Converter - a powerful utility t...
3. Quick Timer PPC
Quick Timer PPC is an application for Pocket PC to control N...
4. UniDoc
UniDoc is a powerful and reliable software application that ...
5. Exif Pilot Lite
Exif Pilot Lite allows you to view EXIF, EXIF GPS and IPTC d...
6. SyncQuick Netwatch
SyncQuick Netwatch provide the ability to monitor multiple s...
7. GdImageBox OCX
GdImageBox OCX is an Image Viewer Control delivered as an Ac...
8. Magicbit DVD to MP4 Converter
Magicbit DVD to MP4 Converter helps you rip and convert DVDs...
9. Alldj DVD to PSP Ripper
Alldj DVD To PSP Ripper is a powerful solution for convertin...
10. Relay Timer
Relay Timer is a powerful tool for NCD ProXR relay controlle...

Software Reviews Full List



Recent Blog Posts

 
1. Opera 10 Alpha Obtained A Score of 100/100 For Acid3 Test
Opera Software ASA has made available for free download a ne...
2. Cloud – An Operating System That Boots Into A Browser
After the successful debut with gOS Linux in Wal-Mart comput...
3. The New Python 3.0: A Release Incompatible With 2.x Versions
Python is a well known programming language that allows the ...
4. Moonlight 1.0 Beta 1 Is Available For Free Download
Moonlight 1.0 Beta 1, the open source implementation of Micr...
5. All Popular Social Networks In One Place: Power.com
The concept of social networking is one of the attributes of...
6. The Distribution of PHP / MySQL Applications With Server2Go
Server2Go is a Windows based software that allows the distri...
7. Link Baiting: The Building Of One Way Links In A Natural Manner
One way links represent a direct link to a given website, bu...
8. Search.io Simplifies The Searching of Information Through A Tabbed Interface
Most of the Internet users begin the navigation of various w...
9. Automatic Creation of An Ubuntu 8.1 Live USB From Windows
As compared with the running of a LiveCD inside a virtual ma...
10. TooManyTabs Mozilla Firefox Add-On Improves Memory Usage And User Browsing Experience
Firefox, one of the most popular web browsers in the world p...

Last 20 Scripts

 
1. JamUba AJAX Stock Script
JamUba AJAX Stock Script allows the user to embed a flexible
2. YouTube Video Script
YouTube Video Script runs on YouTube API and fetches million
3. Smooth Navigational Menu
Smooth Navigational Menu is a multi level, CSS list based me
4. JamUba ftpsync Script
JamUba ftpsync Script is a Perl utility to synchronize the h
5. Easy 2Checkout Integrator
Easy 2Checkout Integrator script provides a simple and easy
6. Wussa
Wussa is a script suitable for webmasters planning to create
7. WPJobAds
WPJobAds is a WordPress job board plugin that lets you sell
8. Wallpaper Website Creator
Wallpaper Website Creator is a PHP based script that helps y
9. Flex2 Advanced ColorPicker
Flex2 Advanced ColorPicker is a color picker component for F
10. Tree CheckBox
Tree CheckBox is a Flex3 component resulted from the customi
11. ResizeTool
ResizeTool is a simple and easy to use resize tool to be imp
12. Flickr Cube Viewer
Flickr Cube Viewer is a small utility based on Flex that all
13. Font Reader
Font Reader is an Actionscript 3 based true type font parser
14. iPod like Scroller
iPod like Scroller is a simple component which can simulate
15. AMFPHP
AMFPHP is a free open-source PHP implementation of the Actio
16. FlashFlickr PhotoGallery
FlashFlickr PhotoGallery is an image gallery made in Flex2/A
17. jQuery.popeye
jQuery.popeye script is an inline lightbox alternative.
18. prettyComments
prettyComments script provides a solution for comment boxes
19. prettyPopin
prettyPopin provides a nice way to display simple content or
20. MYRE Realty Manager
MYRE Realty Manager is a complete realty listing management