Sphider-plus version 3.2017a - The PHP Search Engine





All required information.

[ Change Log Summary ]

 

- Actual release:    3.2017a


- Former versions:

          Version 3.2016d

          Version 3.2016c

          Version 3.2016b

          Version 3.2016a

 

          Version 3.2015e          Version 3.2014c

          Version 3.2015d          Version 3.2014b

          Version 3.2015c          Version 3.2014a

          Version 3.2015b          Version 3.2013b

          Version 3.2015a          Version 3.2013a

 

 

 

 

- Older versions:

          Version 2.9          Version 1.9

          Version 2.8          Version 1.8

          Version 2.7          Version 1.7

          Version 2.6          Version 1.6

          Version 2.5          Version 1.5

          Version 2.4          Version 1.4

          Version 2.3          Version 1.3

          Version 2.2          Version 1.2

          Version 2.1          Version 1.1

          Version 2.0          Version 1.0

 




[ Former versions ]
[ Actual release ]

Version: 3.2016d

October 11, 2016

Build up with Sphider: v.1.3.5

 

New feature:

Ignore the content inside of 'option' tags like

in body part of the HTML. To be activated in admin backend.

 

New feature:

Besides JSON and XML result output file, now also a RSS feed is created.

Separately for text and media results.

 

New feature:

In query log of admin 'Statistics' show/hide the hostname, country and geo info for each query input.

To be activated in admin backend.

 

If selected in admin backend, the advanced part of the search form is suppressed now.

 

Bug fixed in result presentation 'Like Google (Top 2 per URL)'.

 

Bug fixed in search form for category selection.

 

Bug fixed in option 'Do not index comment parts '.

 

Some small bugs fixed.

 

Involved files that have been modified / added for this release:

.../admin/admin.php

.../admin/admin_header.php

.../admin/configset.php

.../admin/spiderfuncs.php

.../admin/url_backupo.php

 

.../include/search_10.php

.../include/search_40.php

.../include/searchfuncs.php

.../include/xml.php

.../include/common/black_ips_priv.txt

 

.../templates/html/20_search-form.php

.../templates/html/25_search-form.php

 

 

Top

 

 



Version: 3.2016c

Release date: May 30, 2016

Build up with Sphider: v.1.3.5

 

New feature:

- Index only e-mail accounts like 'my-name@gmail.com' :

  (Will extract all e-mail accounts from page content).

And as inverse function:

- Do not index any e-mail account as part of page content.

Both to be enabled in admin backend.

 

New feature:

Block all queries sent by known spammer IPs.

The IPs will automatically be updated every 24 hours, containing about 190.000 URLs.

To be activated in admin backend.

 

Improved index procedure:

No longer aborting the complete indexation for 'NOHOST' and 'Too many re-directions'.

If detected on single links (pages), only the involved links will be bypassed.

 

Improved index and search procedures:

Convert all kind of accents and diacritics like á, ç, ê, ì, ü, into their basic vowels

Will present the same results for queries with and without accents.

 

Improved index and search procedures:

Now removing all emoji characters (smileys) from full text,

so that systems still using MySQL versions older than 5.5.3

will be able to highlight search results correctly.

 

Corrected Apache glitch which causes a %252F instead of %2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links will not break during index procedure.

 

Improved error report for false function of PDF converter during index procedure.

 

Updated bot and harvester list (black IPs) to prevent unwanted queries.

 

Bug fixed: More than 3 redirections applied to one page URL forced the index procedure to abort.

 

Bug fixed in highlighting Urdu language text.

 

Bug fixed in authentication script for first call.

 

Some small bugs fixed.

 

Involved files that have been modified / added for this release:

.../admin/admin.php

.../admin/admin_search.php

.../admin/auth.php

.../admin/configset.php

.../admin/db_copy.php

.../admin/db_main.php

.../admin/messages.php

.../admin/spider.php

.../admin/spiderfuncs.php

.../admin/url_backup.php

.../admin/settings/backup/Sphider-plus_default-configuration.php

 

.../include/commonfuncs.php

.../include/commons.php

.../include/search_10.php

.../include/search_40.php

.../include/search_media.php

.../include/searchfuncs.php

.../include/suggest.php

 

.../include/common/black_ips_priv.txt

 

.../templates/html/020_search-form.html

.../templates/html/025_search-form.html

.../templates/html/080_most_pop.html

 

 

Top

 

 



Version: 3.2016b

Release date: March 22, 2016

Build up with Sphider: v.1.3.5

 

New feature:

Besides XML result output file, now also a JSON file is created.

 

New feature:

Protect the .../admin/ folder against external access by foreign clients (IPs).

To be activated in admin backend.

 

New feature:

Protect the .../admin/ folder by means of a .htaccess file now is activated by default.

 

New feature:

Suppress usage of user's right mouse click (context menu) in result listing.

To be activated in admin backend.

 

Improved media search. Now obeying the minimum word length as defined in admin backend.

 

Improved suggest framework. Now offering independent suggestions for text and media queries.

 

 

Bug fixed, which prevented caching of queries that contain numbers.

 

Bug fixed for media search, if multiple databases are involved.

 

Bug fixed for media search, now offering media suggestions.

 

Some small bugs fixed.

 

 

Involved files that have been modified / added for this release:

.../.htaccess

.../search_ini.php

.../admin/admin.php

.../admin/admin_header.php

.../admin/configset.php

.../admin/messages.php

.../admin/settings_backup.php

.../admin/spider.php

.../admin/spiderfuncs.php

 

.../include/commonfuncs.php

.../include/search_10.php

.../include/search_40.php

.../include/search_links.php

.../include/search_media.php

.../include/searchfuncs.php

.../include/suggest.php

.../include/xml.php

 

.../settings/.htaccess

 

.../templates/html/010_html_header.html

.../templates/html/011_html_header.html

.../templates/html/020_search-form.html

.../templates/html/025_search-form.html

.../templates/html/070_more-results.html

.../templates/html/200_no media-found.html

 

 

Top

 

 



Version: 3.2016a

Release date: February 10, 2016

Build up with Sphider: v.1.3.5

 

New feature:

For promoted result presentation, now multiple keywords (catchwords)

and also multiple domain names could be defined.

New feature:

If available in result listing, show a complete phrases as text extract

around the found keyword (search term).

To be activated in admin backend.

New feature:

Index only feeds and ignore all other page content like text and media.

Never the less all links will be followed.

To be activated in admin backend.

New feature:

Index Youtube hosted videos.

To be activated in admin backend.

New feature:

If the option 'Search strictly for search results' is not activated in admin settings,

domains could be searched with and also without the www prefix.

New feature:

Ignore the content of meta tags like , which are placed in body part of the HTML.

Never the less all links will be followed.

To be activated in admin backend.

New feature:

Ignore the content inside of noscript tags like <noscript> THIS CONTENT </noscript>,

which might be placed in body part of the HTML.

To be activated in admin backend.

New feature:

Ignore the content of cookies, which might be added to the page content.

To be activated in admin backend.

New feature:

Do not store links and their attributes as keywords.

To be activated in admin backend.

New feature:

Database support for full UNICODE, including astral symbols.

Requires MySQL server version 5.5.3+

New feature:

Compressed transfer on the Internet enabled for page content and PHP scripts.

Depending on server environment this feature may not work on all servers.

 

Improved MySQL database support:

- Now creating tables in compressed format.

- Protection to prevent error 1071: Specified key was too long, max key length is 767 bytes.

Improved index procedure:

If 'Spider can leave domain during index procedure' is not activated in admin settings,

the external links are not stored as keywords.

Improved UP and DOWN buttons in admin 'Settings' menu, and also in result listing.

Wrapper added to bypass the PHP bug (error known since PHP v.5.3)

gzopen() => gzopen64() and all other gz functions.

 

Bug fixed to store the admin and dispatcher e-mail account in admin backend.

Bug fixed in <!--sphider_noindex--> directive.

Bug fixed for search terms with a length < 3 characters.

Some more small bugs fixed.

 

Involved files that have been modified / added for this release:

Nearly all. Also because MySQL database collation and connector had been modified for this version, a fresh installation is required.

 

 

Top

 

 



Version: 3.2015e

Release date: September 24, 2015

Build up with Sphider: v.1.3.5

 

New feature:

Block all queries for e-mail accounts like 'my-name@gmail.com'

To be activated in admin backend.

 

New feature in admin backend:

Create a default configuration file as backup of current settings and options.

 

New feature:

Sphider-plus may now be installed on server using HTTP as well as HTTPS protocols.

 

Improved index procedure:

Now automatically limiting the amount of data (full text) stored in database

to the max. size of database table field.

 

Ready to run in PHP7 environment.

Proven with PHP 7.0.0 RC3

 

Updated bad bot list. Now containing 1040 bot names.

 

Updated list of Meta search engines.

 

Admin backend updated for Opera and Chrome browser.

 

Bug fixed for 'Show EXIF info' in admin backend statistics 'Indexed Images'.

Bug fixed while simulating a web shot in admin backend.

Bug fixed for presenting multiple result pages in media search.

Some small bugs fixed.

 

Involved files that have been modified / added for this release:

.../admin/admin.php

.../admin/admin_header.php

.../admin/auth.php

.../admin/auto_index.php

.../admin/configset.php

.../admin/confirm.

.../admin/geoip.php

.../admin/messages.php

.../admin/setting_backup.php

.../admin/spiderfuncs.php

.../include/commonfuncs.php

.../include/commons.php

.../include/search_10.php

.../include/common/black_ips.txt

.../include/commom/black_uas.txt

.../templates/Pure/adminstyle.css

.../templates/Slade/adminstyle.css

.../templates/Sphider-plus/adminstyle.css

 

 

Top

 

 



Version: 3.2015d

Release date: July 06, 2015

Build up with Sphider: v.1.3.5

 

New feature for command line operation:

Enabled to index with respect to preference level. To be invoked by:

-preferred <level>

 

Improved admin backend:

Verification of PHP and MySQL server clocks to run synchronously and without offset.

 

Modified 'Pure' template.

 

Bug fixed in sub-menu of 'Server Info'.

Bug fixed in warning messages as part of index procedure.

Bug fixed in sub-folder creation for first log-in.

Some more small bugs fixed.

 

Involved files that have been modified / added for this release:

.../addurl.php

.../search.php

.../admin/admin.php

.../admin/admin_header.php

.../admin/admin_search.php

.../admin/auth.php

.../admin/auth_db.php

.../admin/auto_index.php

.../admin/configset.php

.../admin/db_copy.php

.../admin/db_main.php

.../admin/geo_show.php

.../admin/install_tables.php

.../admin/messages.php

.../adminspider.php

.../admin/spiderfuncs.php

.../admin/url_backup.php

.../templates/Pure/adminstyle.css

.../templates/Pure/userstyle.css

 

 

Top

 

 


Version: 3.2015c

Release date: May 29, 2015

Build up with Sphider: v.1.3.5

 

In front of version 3.2015b the following modifications have been added:

 

New option to define the chronological order of text result listing:

Single result per page (ignoring arguments in URL)

Will present only one result for URLs like:

. . . /pizza-restaurants.html

. . . /pizza-restaurants.html?page=1

. . . /pizza-restaurants.html?page=2

 

New method of script addressing:

No longer relative addressing, but the scripts now contain server based addressing.

Consequently cron jobs may call the index procedure from anywhere,

wherever the Sphider-plus scripts live on the server.

 

New feature:

Admin backend protected against XSRF attacks (Cross-Site-Forgery-Request).

Independent from IDS.

 

New feature:

Admin backend protected against SFA attacks (Session-Fixation-Attacks).

Independent from IDS.

 

New feature:

Limit the duration of a session (time-out) in admin backend.

To be defined in 'Settings' menu as seconds of inactivity.

 

New feature:

Prevent search form from being flooded by too many queries per unit of time.

To be activated in admin settings.

 

Improved suggest framework:

Now starting the search function immediately after selecting any keyword.

Without clicking the 'search' button.

 

Improved user statistics in admin backend:

Now delivering all available Geo info, including Google map for IP location.

 

Improved detection of Operating System.

 

Updated data file for GeoIP.

 

Updated black list for Meta search engines.

 

Updated jQuery framework.

 

Bug fixed in function 'Index media'.

 

Bug fixed in 'Multithreaded indexing'.

 

Some more small bugs fixed.

 

 

Involved files that have been modified / added for this release:

Nearly all. Please use all scripts as per download for your installation.

Except all files in

.../templates/Pure/

.../templates/Slade/

.../templates/Sphider-plus/

These files remained unchanged since last version of Sphider-plus.



Top



Version: 3.2015b

Release date: March 09, 2015, 2015

Build up with Sphider: v.1.3.5

 

In front of version 3.2015a the following modifications have been added:

 

New feature for index procedure:

- Instead of the HTML tags 'title' and 'description', use admin edited title and description.

  Will be overwritten, if one of the following new features is selected:

- Use admin edited title, if count of words in HTML title tag is less than: xxx

- Use admin edited description, if count of words in HTML description tag is less than: xxx

 

New option in admin backend:

Delete all sitemaps created during index procedures, which were performed in debug mode.

To be found in sub menu 'Clean'.

 

User and admin interface now are HTML 5 conform. With special thanks to Ulf Dunkel for his code input.

 

Updated scripts for ID3 extraction. Now parsing ID3v1(v1.0 & v1.1) as well as ID3v2(v2.2 & v2.3 & v2.4).

 

Bug fixed in 'Multithreaded indexing' in conjunction with sitemap.xml files.

Bug fixed in option:

Allow other hosts with same domain name for all links found during indexing. Also ignore TLD, SLD and www

Bug fixed in 'Ignoring parts of a page defined by <div id=> or <div class=>' in conjunction with nested div tags.

Bug fixed in 'Activate/disable database' menu for multiple databases containing the same table prefix.

Bug fixed in 'Import / Export URL list' for multiple categories per site.

Some small bugs fixed.

 

Involved files that have been modified / added for this release:

.../addurl.php

.../root.php

.../search.php

.../search_ini.php

.../admin/admin.php

.../admin/configset.php

.../admin/db_main.php

.../admin/http.php

.../admin/index_media.php

.../admin/install_tables.php

.../admin/messages.php

.../admin/setting_backup.php

.../admin/spider.php

.../admin/spiderfuncs.php

.../admin/url_backup.php

.../admin/getid3/all scripts

.../converter/docx/AutoLoader.inc.php

.../converter/docx/CreateDocx.inc.php

.../converter/docx/Docx2Text.inc.php

.../include/categoryfuncs.php

.../include/commonfuncs.php

.../include/search_10.php

.../include/search_media.php

.../include/show_id3.php

.../templates/html/all files



Top



Version: 3.2015a

Release date: January 06, 2015

Build up with Sphider: v.1.3.5

 

 

New feature:

Responsive design for search form, result listing and addurl form.

Automatically adapting to display size of computer, tablet, smartphone, etc.

 

New feature:

Use list of ul tag classes to ignore the corresponding ul content during index/re-index.

A common list of ul class values is used to ignore parts of a page.

Content between <ul class='this_valu'> and </ul> will be ignored,

however links in it are followed. Multiple and nested ul tags will be attended.

Values in common list may end with a wildcard, so that 'menu*' will work for

menu1, menu2, menu_left, etc.

Usable also for external pages, if it is impossible to add the <!--sphider_noindex--> tags.

Details in chapter: Ignoring parts of a page defined by <ul class=> . . . . </ul>

 

New feature:

Use list of pre tag classes to ignore the corresponding pre content during index/re-index.

Details in chapter: Ignoring parts of a page defined by <pre class='abc'> . . . . </pre>

 

New feature:

User of the search engine may select, whether the advanced part of the search form should be presented.

 

New feature in admin backend:

Verify the time required to create a web shot of any desired URL.

 

Improved protection against intrusion attempts.

 

Accelerated search algorithm, especially for presentation of multiple results per page.

 

Common footer file for addurl.php script (templates/html/092_footer.html).

 

Updated scripts for ID3 extraction. Now parsing ID3v1(v1.0 & v1.1) as well as ID3v2(v2.2 & v2.3 & v2.4).

 

Updated IDS scripts.

 

Updated bot and harvester list (black_ips.txt) to prevent unwanted queries.

 

Enhanced .htaccess file. For details see item 11 in .../.htaccess

 

Bug fixed in option 'Use list of div ids and classes to ignore the div content during index/re-index'.

 

Some small bugs fixed.

 

Involved files that have been modified / added for this release:

.../.htaccess

.../addurl.php

.../search.php

.../search_ini.php

.../admin/admin.php

.../admin/admin_header.php

.../admin/admin_search.php

.../admin/auth_db.php

.../admin/auto_index.php

.../admin/configset.php

.../admin/db_activate.php

.../admin/db_config.php

.../admin/db_copy.php

.../admin/db_main.php

.../admin/settings_backup.php

.../admin/spider.php

.../admin/spiderfuncs.php

.../admin/url_backup.php

.../admin/ID3/all_files

.../include/commonfuncs.php

.../include/commons.php

.../include/click_counter.php

.../include/ids_handler.php

.../include/search_10.php

.../include/search_40.php

.../include/search_media.php

.../include/show_id3.php

.../include/common/black_ips

.../include/IDS/all scripts

.../languages/all scripts

.../templates/html/015_headline.html

.../templates/html/020_search-form.html

.../templates/html/025_search-form.html

.../templates/html/030_category-selection.html

.../templates/html/040_category-tree.html

.../templates/html/050_result-header.html

.../templates/html/060_text-results.html

.../templates/html/080_most_pop.html

.../templates/html/092_footer.html

.../templates/html/120_media-only results.html

.../templates/Pure/hdline.

.../templates/Pure/userstyle.css

.../templates/Slade/hdline.jpg

.../templates/Slade/userstyle.css

.../templates/Sphider-plus/hdline.jpg

.../templates/Sphider-plus/userstyle.css



Top



Version: 3.2014c

Release date: September 28,, 2014

Build up with Sphider: v.1.3.5

 

In front of version 3.2014b the following modifications have been added:

 

New option:

Index only media content and ignore any text.

To be activated in Admin settings.


New option:

Do not index content, which is placed as div class="s-hidden" like:

<div id=" . . ." class="s-hidden" > this content </div>


New option:

Treat localhost URLs RFC 1808 conform.

If activated, http:/localhost/ will always be used as root directory.

Otherwise URLs like http://localhost/public_html/my_site/

will also be accepted as root directory.


New option:

Admin and database access authorization are now available as part of the 'Settings' menu in Admin backend.

'User name' and 'Password' for Admin access, and also for the database configuration, are no longer stored as readable values. Now they are stored hashed in a separate file.


Accelerated index procedure for media content like images.


New feature in index procedure:

Indexation of media content enabled, even if no text content is available (page contains less than x words).


Improved index procedure:

- Now extracting the 'option' values in < select > tags.

- Now splitting words also at multiple special characters.

- Remove all tag content from full text.

- Improved charset detection.


Improved option

'Convert all kind of accents like á, ç, ê, ì, ü, into their basic vowels'

Now additionally all special HTML character codes for Czech, Romanian, Slovak and Slovenian

languages will be treated as accents and translated into their basic letters.

Examples:

Ę => E

ů=> u

ť => t

ž => z

etc.


Improved intrusion protection:

- Prevent 'session fixation attacks'

- Improved protection against SQL injection, even without activated IDS


Updated link and charset detection for HTML5 coded URLs.


Updated Danish language file. Thanks to 'incognito'.


Bug fixed in result listing for title presentation, containing %20 blanks.


Some small bugs fixed.



Involved files that have been modified / added for this release:

.../admin/auth.php

.../admin/auth_db.php

.../admin/configset.php

.../admin/db_config.php

.../admin/index_media.php

.../admin/messages.php

.../new_authent.php

.../admin/spiderfuncs.php

.../admin/settings/authentication.php

.../include/commonfuncs.php

.../include/search_40.php

.../include/search_media.php

.../include/searchfuncs.php

.../languages/dk-language.php



Top



Version: 3.2014b

Release date: June 03, 2014

Build up with Sphider: v.1.3.5

 

In front of version 3.2014a the following modifications have been added:

 

New option:

Obey the list of words not to be suggested.

To be activated in Admin settings.

For details please notice the chapter Suggest framework

New option:

Suppress IDS warning messages in Admin backend.

If activated in Admin settings, all warning messages delivered by the 'Intrusion Detection System'

are blocked in dialogue of Admin backend.

Improved search algorithm for queries containing numbers.

- If the search term 12.34 does not deliver any result, 12,34 will be used alternately.

The same vice versa.

- If the search term test123 does not deliver any result, test 123 will be used alternately

The same vice versa

Improved detection of title tag in HTML head. If the <title >tag is not available, now also parsing <meta name 'title' = 'this title'/>

Warning message added, if the cURL library as part of the PHP environment is not installed.

Improved suggest framework. Now delivering suggestions also for integer numbers.

Improved image search. Now searching in title of image, as well as in file name. Also thumbnails are created for .png images, which are marked to be .jpg

Improved word stemming implementation. Now allowing to search for original and stemmed words.

Improved .doc converter. Now supporting 32 bit and 64 bit Windows OS.

Improved .ppt converter. Now supporting 32 bit and 64 bit Windows OS.

Improved Admin interface. If a site is member of multiple categories, all of them are presented in 'Sites' view for the according URL.

Length of 'Name of promoted domain' enlarged to 255 characters.

Length of 'Promoted catchword in text' enlarged to 255 characters.

Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title.

Bug fixed in category search for option 'More results from category xyz'.

Bug fixed in defining an external template folder, outside of the Sphider-plus installation folder.

Bug fixed in German word stemming for upper case vocals.


Involved files that have been modified / added for this release:

.../admin/admin.

.../admin/auth.

.../admin/configset.

.../admin/index_media.

.../admin/messages.

.../admin/spiderfuncs.php

.../converter/catdoc.

.../converter/catppt.

.../include/commonfuncs.

.../include/search_10.

.../include/search_40.

.../include/searchfuncs.

.../include/search_media.

.../include/suggest.

.../include/stemming/de-stem.

.../templates/html/010_html_header.html

Top



Version: 3.2014a

Release date: March 03, 2014

Build up with Sphider: v.1.3.5

 

In front of version 3.2013b the following modifications have been added:

 

New feature:

Equalize text results for query terms with and without ligatures.

Worked out for Latin ligatures in Unicode (Latin-derived alphabets) and also ligatures

used only in phonetic transcription, but not taking into consideration medieval ligatures.

Will present the same results for:

cœur and coeur

= dezh = ʤ

New feature:

Delete multiple used letters from page content.

Will delete decoratively used duplicates, like *****, -----, =====, etc.

New feature:

Query strictly for search results. Available for AND, OR, and PHRASE search modus.

Searching for data it will not present results for database or CDATA.

New feature:

Index the meta contents in .xlsx and .docx files. If available in file, it will index:

Titel, Subject, Keywords, Creator, Description,

Last Modified By, Revision, Date Created, and Date Modified.

New method implemented, when manually aborting the index procedure.

Now obligatory a repair, optimize and table flush will be performed.

Worked out also for multi threaded indexing.

Thumbnails, created of the indexed URLs, will now be taken from top of the page; no longer centred.

UTF-8 charset ensured for data transfer from/to MySQL database.

Updated IDNA converter.

Updated English language file.

Several new session ids implemented to be stripped during index procedure.

Improved XML result output:

Now caching the XML results and additionally offering:

- Date and time of query.

- User IP.

- Name of remote host.

Also all XML output files are now stored for permanent usage in sub folder .../xml/stored/

To be deleted in Admin backend in 'Clear' menu.

Improved support for multiple table sets in one database.

If the option 'Do not index the full text' is activated in Settings menu, the 'Required number of words in a page in order to get indexed' is obligatory set to zero

Bug fixed in option: Show 'Description' Meta tag (if exists) in results listing.

Bug fixed in PHP error messages. Strict warnings will no longer be presented.


Involved files that have been modified / added for this release:

.../addurl.php

.../admin/admin.php

.../admin/admin_header.php

.../admin/admin_search.php

.../admin/auto_index.php

.../admin/db_activate.php

.../admin/db_common.php

.../admin/db_copy.php

.../admin/configset.php

.../admin/db_main.php

.../admin/install_tables.php

.../admin/messages.php

.../admin/real_get.php

.../admin/real_log.php

.../admin/settings_backup.php

.../admin/spider.php

.../admin/spiderfuncs.php

.../include/click_counter.php

.../include/commonfuncs.php

.../include/idna_converter.php

.../include/media_counter.php

.../include/search_10.php

.../include/search_links.

.../include/search_media.php

.../include/searchfuncs.php

.../include/show_id3.php

.../include/suggest.php

.../include/xml.php

.../languages/en-language.php



Top



Version: 3.2013b

Release date: September 19, 2013

Build up with Sphider: v.1.3.5

 

In front of version 3.2013a the following modifications have been added:

 

New option in Admin settings:

Use words in white- and blacklist as trimmed values.

If activated, for example the word ' cailis' in blacklist will match with words like 'specialist'. Instead, if not activated, the leading blank of ' cailis' will prevent to match the word 'specialist' in full text.

 

Improved index procedure for URLs, containing quotes.

Improved black list comparison.

Improved xlsx converter.

Bug fixed in option 'Use list of div ids to ignore the complete div content during index/re-index'.

Bug fixed in 'Most Popular search' table at the bottom of result listing.

Bug fixed while indexing <!--sphider_noindex-->

Bug fixed in result listing, while sorted 'Like Google'.

Debug info removed from index procedure.

MySQLi error messages now are correctly assigned to the according SQL queries.

 

Involved files that have been modified / added for this release:

.../admin/admin.php

.../admin/auto_index.php

.../admin/configset.php

.../admin/spider.php

.../admin/spiderfuncs.php

.../converter/xlsx_reader.php

.../include/commonfuncs.php

.../include/search_10.php

.../include/search_40.php

.../include/searchfuncs.php

.../templates/html/060_text-results.html

.../templates/html/080_most_pop.html



Top



Version: 3.2013a

Release date: July 28, 2013

Build up with Sphider: v.1.3.5


Honor:

This version of Sphider-plus is dedicated to Anton Cygankov. For all the month, he followed the development with an ongoing testing. Verifying especially the index procedure with thousands of URLs and reviewing all the bugs, I intermediately implemented. Thank you very much for this big effort. I wouldn't be able to get up to the current status of development, without your enormous support.
It took several months to come up with the v.3 release, but together with the SQLi support, it seems to be a reasonable step into the future of this search engine.


In front of version 2.9 the following modifications have been added:

New feature:

Index DOCX files. To be activated in Admin settings.

Implemented as PHP script, the converter needs no adoption to the Operating Syst

New feature:

Index XLSX files. To be activated in Admin settings.

Implemented as PHP script, the converter needs no adoption to the Operating System.

New feature:

Index only prioritized sites. Level depended re-index of only those URLs, containing the according level.

For details please notice the chapter: Prioritized indexing

New option:

Admin's 'Sites' table sorted by index priority.

New feature:

Create a thumbnail of all Internet URLs during index procedure. Will be presented as part of the text result listing for each link. To be activated in Admin backend.

For details please notice the chapter: Create thumbnails during index procedure

New feature:

Prevent indexing of known malware and pishing pages. This feature is supplied by a Google web service to prevent indexing of pages that contain malware or phishing content.

For details please notice the chapter: Prevent indexing of known malware and pishing pages

New feature:

If the blacklist is met too often, automatically abort the indexation of the regarding site. Defined to a count of 20.

New option:

Check correct converting of content into UTF-8

Will detect invalid charset definitions in Meta tags of HTML header, or invalid charset definition supplied via HTTP by the client server. If an invalid charset is detected, the index procedure will be aborted for the regarding link.

New feature:

The addurl form now will only store domain name and TLD. Something like 'sphider-plus.eu'

Thus, www. and any sub folder of the suggested URL will be ignored.

New feature:

Ignore the content of style="display:none" in div elements. Something like:

<div style="display:none">ignore_this_content</div>

New feature:

In order to enable immediate query input, auto focus is set to the search form.

New suggest framework.

The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery.

For details please notice the chapter: Suggest framework

New feature:

Separate search fields for text and media queries. Consequently also separate suggestions

will be offered. To be activated in Admin 'Settings'.

New feature:

Restrict the search results by means of up to 5 categories simultaneously.

Import and export of URLs with multiple category definitions assigned to each site.

For details please notice the chapter: Parallel structure of search in categories.

New feature:

Now indexing also site URLs containing the https scheme.

Improved index procedure:

- Now treating link URLs with and without 'www' as equal, and excluding them as duplicate pages.

- Linking in it selves caused by HTTP 301/302/307 redirections are intercepted. Thus infinite indexation is prevented.

- Multiple attempts to redirect in it selves will force Sphider-plus to abort the index procedure for the involved site.

New option in Admin 'Settings' menu:

Define count of redirections followed for each link (1-9) while indexing.

New options in Admin 'Settings' menu:

Follow URL redirections, which are invoked by JavaScript like

<sript . . . 'window.location.replace . . . . '

<script . . . var cURL = . . . .

<script . . . window.location = . . . . AND " + location.host + "

and several other script directives.

New options in Admin 'Settings' menu:

Follow URL redirections, which are invoked by body tags like

<BODY onLoad = "parent.location = 'home.asp'">

'HTTP-EQUIV= . . refresh . . content= . . .'

and several other tags

New option in Admin 'Settings' menu:

Obey refresh delay directives, placed in meta tags like

<meta http-equiv="refresh" content="180;url=http://www.moodys.com.ar">

New option in Admin 'Settings' menu:

Do not index comment parts and scripts outside the HTML tags.

New option in Admin 'Settings' menu:

If not already exist, add a final slash to the path for all detected links.

If a file name exists as part of the path, this option will be bypassed.

Also, if the http request for the main URL is only excepted without slash, this option will not be obeyed.

New option in Admin 'Settings' menu:

Convert all link URLs to lower case characters.

New option in Admin 'Settings' menu:

Convert all link URLs found during indexation into UTF-8

Will convert URLs like:

/3v/catalog/%C1%E0%E2%E0%F0%E8%FF+%E8%E7%F0%E0%E7%E5%F6/

into:   /3v/catalog/Бавария+изразец/

Improved link detection:

- Invalid URLs containing duplicate slashes in its path will be ignored.

- The following links are followed now:

<script>window.document.location ="/this.path";</script>

<script>window.document.location.href="/this.path";</script>

<script>window.location.replace("/this.path")</script>

<script>"https|http this URL"</script>

<body onload "/this.path">

and several other.

New option in Admin backend 'Clean' menu:

Truncate all tables in database.

Improved 'NOHOST' detection during index procedure:

Now trying 5 times to get in contact with the server.

Each attempt is performed by 2 different HTTP requests.

Improved 'Add site' function in Admin backend.

Now treating URLs with the scheme 'http' and 'https' as equal, and excluding them as duplicate sites.

Support added for Windows-31J (CP932) charset as extension of Shift JIS. (CP932 contains standard 7-bit ASCII codes, and Japanese characters are indicated by the high bit set to 1)

UTF-8 support implemented for media titles, file names and ID-3 tags.

SQLi connector implemented between PHP and a MySQL database. Performed by OOP.

Bug fixed in option: Do not index the full text.

Bug fixed for URLs containing CP1252 coded paths.

Bug fixed in detection of www/non www links. Now preventing double indexing.

Bug fixed in 'Strip session ids'.

Bug fixed in Korean word segmentation.

Some small bugs killed.


Involved files that have been modified / added for this release:

As the SQLi connector is implemented between PHP and a MySQL database, nearly all scripts are renewed.
It is strongly recommended to perform a fresh installation for this version.


Top



Version: 2.9

Release date: November 01, 2012

Build up with Sphider: v.1.3.5


In front of version 2.8 the following modifications have been added:


New feature:

Support for non-ASCII URLs, using 'Internationalized Domain Names' (IDN)

as defied in RFC 3490, RFC 3491 and RFC 3492.

If activated, internationalized domain names like 'http://президент.рф/' and 'http://müller.de/'will be accepted as new sites in Admin backend, as well as in User's addurl form.

New feature:

Support for Punycode URLs like http://xn--90aoqlh7c4a.xn--d1abbgf6aiiy.xn--p1ai/

Will be converted into the readable form http://события.президент.рф/

To be activated in Admin settings.

New feature:

Besides the usual HTML elements <element> , also delete from full text all those HTML elements, which are defined like & l t; element & g t;

To be activated in Admin settings.

New feature:

Index only parts of a page, defined by <element > . . . </element>

This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc

If enabled in Admin settings, the values as defined in the list-file

.../include/common/elements_use.txt will be used to index only the page content between

<element> . . . </element>

For details see chapter Indexing only parts of a page defined by <element> . . . . </element>

New feature:

Ignore parts of a page, defined by <element> . . . </element>

This feature is foreseen to cooperate with the new HTML5 elements like

section, nav, aside, hgroup, article, header, footer, etc.

If enabled in Admin settings, the values as defined in the list-file

.../include/common/elements_not.txt will be used to remove the content between

. . . from the page content.

This is the contrary function to 'Index only parts of a page, defined by <element> . . . </element>'

For details see chapter Ignoring parts of a page defined by <element> . . . . </element>

New feature:

Index only files and documents with defined suffix :

If activated, all pages of the site will be searched for links, but only files with suffixes as defined in the docs list will be indexed.

For details see chapter Index only files and documents with defined suffix

New feature:

1. Perform a WHOIS check for sites waiting for approval in Admin backend.

2. Perform a WHOIS check for suggested URLs direct in the addurl form, so that invalid URLs will automatically be rejected.

For both tests a basic list of WHOIS servers for the generic top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable.

New option to be activated in Admin backend:

Crawler can leave domain during index procedure, but only for canonical links.

Only the canonical link will be indexed, but links found there will be ignored.

New feature:

Obey the 'refresh' meta tags as part of HTML headers.

Now following the redirection and delayed indexing.

New option:

Support UTF-16 coded sites. Will convert UTF-16 coded sites into UTF-8.

To be activated in Admin settings

New option:

For index procedure always use the standard Firefox HTTP_USER_AGENT string and ignore the individual defined Sphider-plus string. To be activated in Admin backend.

New feature

Follow redirections, which are invoked by JavaScript, when sent as HTTP content.

Will obey directives like:

<SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT>

New feature:

Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes.

New feature:

Separated PDF converter supplied for 32 and 64 bit Operating Systems.

For details, please notice chapter PDF converter for Linux/UNIX systems

New feature:

Follow links placed in JavaScript files. Will detect and follow links like

document.write(' <a href="new_12.pdf">All news 2012</a> ');

Also the complete content of

document.write( this text in all rows');

will be indexed and stored as keywords in db.

New feature:

Now indexing also sites, which do send a obligatory request for a cookie, to be set by the crawler.

New feature:

In order to reduce transmission time, the crawler now requests gzip-formatted data transfer from the remote server for the URL to be indexed.

New option:

In order to convert the text into UTF-8, use the charset definition as supplied via HTTP by the client server.

If this option is not activated in Admin Settings, the charset will be extracted from the header of the files to be indexed. If not found, like in PDF documents, the preferred charset will be used.

New option:

Delete duplicate parts of the URL path found in the indexed page URL and the new links.

Unfortunately some CMS seem to be unable to build up a correct path for relative links.

If activated in Admin backend, these duplicate parts of the path will be deleted from the link URL. Should be activated only, if sites are indexed created by dedicated CMS.

New feature:

Show summary of actually active User database at the bottom of result listing.

To be activated in Admin backend, the count of sites, categories, page links and keywords are displayed.

New feature:

Automatically deleting invalid URLs from Admin 'Sites' view.

Improved 'Add site' function in Admin backend.

Now treating URLs with and without 'www' as equal, and excluding them as duplicate sites.

Improved image indexing procedure

Now also indexing phpBB images, linked by php command files.

New option

Suppress the file suffix from image file names for indexing.

Improved media indexing procedure

In case of missing title tag, now the alt tag is used to define the name of the media. In case that also the alt tag is missing, the file name will be used as keyword.

Improved "banned domain" management

Now holding name and suffix of the banned domains, and no longer the URLs.

Improved index procedure

Now ignoring links that try to link to the calling URI (self back linking).

Improved link detection for relative links, which are to be found in full text.

Improved input protection against SQL injections

Improved Admin statistics

Now providing also the IP, country code and country name for

- Search log

- Most popular searches

- Most popular page links

- Most popular media links

Updated GeoIP database, used to provide the IP, CC and country name for the Admin statistics. Now also supporting IPv6 URLs.

Support on Windows systems temporary removed for ppt files, as the converter causes failures on large PowerPoint documents.

Bug fixed, which prevented category selection without activating the "Advanced search form" option.

Bug fixed that caused invalid URL encoding in result listing.

Bug fixed causing the error output "Unknown column 'naame' in field list" during media indexing.

Bug fixed that caused MySQL warning messages during index procedure at some older MySQL versions, if the URL to be indexed contained blank characters.

Bug fixed, which caused invalid URL creation for relative links containing a file name and/or query.

Bug fixed in option 'Crawler can leave domain'.

Bug fixed in option 'Use list of div ids to ignore the div content during index/re-index'.

Bug fixed in option 'Enable to decode entity coded sites into standard HTML characters'.

Bug fixed in 'addurl' form, which prevented input of words containing accents in 'title' and 'description' fields.

Some additional small bugs killed.

 

Involved files that have been modified / added for this release:

.../addurl.php

.../admin/admin.php

.../admin/admin_header.php

.../admin/admin_search.php

.../admin/auth.php

.../admin/auth_bypass.php

.../admin/auth_db.php

.../admin/configset.php

.../admin/db_activate.php

.../admin/db_config.php

.../admin/db_main.php

.../admin/geoip.php

.../admin/GeoIP.dat

.../admin/http.php

.../admin/index_media.php

.../admin/install_tables.php

.../admin/messages.php

.../admin/spider.php

.../admin/spiderfuncs.php

.../admin/url_backup.php

.../converter/feed_parser.php

.../converter/pdftotext32.script

.../converter/pdftotext64.script

.../include/click_counter.php

.../include/commonfuncs.php

.../include/domain_whois.php

.../include/idna_converter.php

.../include/media_counter.php

.../include/search_10.php

.../include/search_40.php

.../include/search_50.php

.../include/search_media.php

.../include/searchfuncs.php

.../include/suggest.php

.../include/common/docs.txt

.../languages/ all files

.../templates/html/020_search-form.html

.../templates/html/090_footer.html

.../templates/html/091_footer.html


Attention: This version requires an updated set of database tables. It is strongly recommended to follow the instructions as described in chapter: "Updating from 2.x to 2.y" for version 2.9


Top


Version: 2.8

Release date: March 31, 2012

Build up with Sphider: v.1.3.5


In front of version 2.7 the following modifications have been added:


New feature:

Same results for queries typed with pure vowels or with accents. Will deliver the same results for queries like: cafe and café. To be activated in Admin backend.


New feature for AND and OR search:

If the length of the text extract in result listing is too short to highlight all search words, additional text extract are build up to highlight all search words of the total query.


New feature:

Besides bulk Re-indexing of all sites, the periodical Re-indexer is now available also site specific.

To be activated individual in "Options" menu of each site.


New feature:

Bound the length of full text indexed at each page. Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000.


New option to be set in Admin backend:

Block all queries sent by Meta search engines like Google, MSN, Amazon, etc.

For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil."


New option to be set in Admin backend:

Block all queries sent by crawler known to be evil.

For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil."


New option to be set in Admin backend:

Delete special characters inside of words. Underscores, hyphens and symbols like ‘ ・ “

etc. as part of words are deleted. So only the pure words will be indexed.


New feature:

The indexer could be interrupted periodically after indexing a predefined count of pages (links).

Configurable in Admin settings.


New option to be activated in Admin backend:

Convert all kind of double quotes like “ and ” into standard quotes "


New option to be activated/disabled in Admin backend:

Show time elapsed (to fetch the results) in result header.


New option to be activated/disabled in Admin backend:

In result listing show the actual result number of each result.


New option to be activated/disabled in Admin backend:

In result listing show the URL of each result in a separate row.


New option to be selected in Admin backend :

Define the default chronological order for media result listing

- By title (alphabetic)

- By image size

- By 'Last queried'

- By 'Most popular'

- By file suffix


New option to be activated in Admin backend :

Limit the amount of media results presented together with text results.

Defined as maximum count of media results per page. The image results are counted separately from audio + video streams.


New method of thumbnail storage:

The thumbnails are no longer stored in a sub folder of the Sphider-plus installation, but now are stored in database table "media" in field "thumbnail".


Improved media search:

AND, OR and TOLERANT modes are now selectable for media search, while the PHRASE mode will be interpreted as an AND search.


Improved media search:

Henceforward file name as well as the title will be queried to find media results.


New options to be defined in Admin backend:

The following basic indexing options are globally definable for all sites:

- Spidering depth: Full Index or folder depth definition

- Spider can leave domain

- Use preferred charset for indexing

Afterwards individual settings could be performed site specific in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form).


New option in Admin 'Clear' menu:

Clear all entries in 'Addurl' table.


New option in Admin 'Clear' menu:

Clear all entries in 'Banned' table.


Improved option:

Ignoring parts of a page defined by <;div id='abc'>; now is working alternately also for <;div class='abc'>;

Besides the string list in divs_not.txt file, the file now alternatively may contain regexp patterns.


Improved option:

Indexing only parts of a page defined by <;div id='abc'>; now is working alternately also for <;div class='abc'>;

Besides the string list in divs_use.txt file, the file now alternatively may contain regexp patterns.


Presenting of multiple hits in result listing enabled now also for strict search.


Language files added for Norwegian (nynorsk and bokmål). Thanks to Geir Kleiveland.


White- and blacklist, as well as the other lists in .../include/common/ folder now are tolerating (ignoring) blank rows.


Improved index procedure, now also accepting links containing "blank" characters.


Improved "Erase & Re-index all" function. Now deleting also the "Pending" and "Temp" tables.


Support for Greek language totally rewritten. Now accepting Latin characters for old and new Greek transcription.

For details see chapter: "Greek language support"


Improved parser for RSS v.2.0 feeds.


Bug fixed in index procedure, which prevented correct indexing of text placed behind multiple tabs.


Bug fixed in search function for searching in multiple databases.


Bug fixed in result listing when presenting multiple hits per page.


Some more small bugs killed.



Involved files that have been modified / added for this release:

.../admin/admin.php

.../admin/admin_header.php

.../admin/auth.php

.../admin/auto_index.php

.../admin/configset.php

.../admin/db_copy.php

.../admin/db_main.php

.../admin/index_media.php

.../admin/install_tables.php

.../admin/messages.php

.../admin/real_log.php

.../admin/spider.php

.../admin/spiderfuncs.php

.../converter/feed_parser.php

.../include/click_counter.php

.../include/commonfuncs.php

.../include/make_captcha.php

.../include/media_counter.php

.../include/search_10.php

.../include/search_40.php

.../include/searchfuncs.php

.../include/search_media.php

.../include/show_id3.php

.../include/suggest.php

.../include/common/black_ips.txt

.../include/common/black_uas.txt

.../languages/nn-language.php

.../languages/no-language.php

.../templates/html/010_html_header.html

.../templates/html/011_html_header.html

.../templates/html/020_search-form.html

.../templates/html/021_search-form.html

.../templates/html/022_search-form.html

.../templates/html/050_result-header.html

.../templates/html/060_text-results.html

.../templates/html/070_more-results.html

.../templates/html/090_footer.html

.../templates/html/120_media-only results.html

.../templates/html/140_image-results.html


Attention: This version requires an updated set of database tables. It is strongly recommended to follow the instructions as described in chapter: "Updating from 2.x to 2.y" for the actual version 2.8


Top


Version: 2.7

Release date: October 18, 2011

Build up with Sphider: v.1.3.5

 

In front of version 2.6 the following modifications have been added:


New indexing feature:

Re-indexing could be performed periodically. Once started, this mode will automatically re-index

all sites periodically. The time interval is Admin selectable for

3 hours, 12 hours, 1 day, 1 week or 1 month.

Also the count of periodically performed re-indexing procedures is Admin selectable.

For details see chapter: Periodical Re-indexing


New feature for media search:

Find media results not only by media 'tile', but also by EXIF and ID3 info

To be activated in Admin backend.


New option in Admin settings called:

"Use string list of 'URL Must Not include' also to prevent erasing of involved URLs"

If activated, also erasing of the involved sites and pages (links) will be prevented.

In order to erase all sites and all pages completely, it might become necessary to uncheck this option


Improved search form. Now offering separated search buttons for 'text' and 'media' queries, as well as a button for combined search.


Improved search procedure for combined search of text and media, in order to speed up the search procedure.


Improved delete function in Admin backend:

If a site is deleted from the admin backend, now also all keyword relationships to that site are withdrawn from the database. Site-specific links, category relationships and other dependencies, like registrations in temporary and pending tables, had been already observed before.


Improved Admin search function:

Searching for 'Sites', the result listing now will present also the 'Options' button to select

Edit, Re-Index, Erase & Re-index, Erase, Delete, Pages, Browse and Statistics


Improved index procedure for media indexing:

No longer accepting dead links. In order to become indexed, the media file must be present.


Improved index procedure to speed up indexing.


Improved index procedure to cooperate with those servers that do not accept basic authentication strings.


Improved index procedure:

If the 'User Agent String' as defined in Sphider-plus Admin backend is not accepted by the site

to be indexed, Sphider-plus will use a standard browser HTTP_USER_AGENT to connect to the site.


New algorithm to delete the content of HTML and PHP tags

No longer using the PHP function strip_tags(); now also unclosed and invalid tags will be observed

during index procedure. As result, also the text following an unclosed or invalid tag will become

indexed. This part of the full text was cut off by the PHP function strip_tags().


Modified index procedure:

The instructions 'RESET QUERY CACHE' and 'FLUSH TABLE' will only be used, if the following Admin setting is activated:

'Clean resources during index / re-index and also for search function'


Improved 'Settings' interface in Admin backend. After pressing 'Save', now additionally presenting the eventually existing dependencies and the necessarily modified settings.


Improved 'Approve sites' menu. If categories are available, as per default the new sites are placed in category 'none'.


Improved search function:

If in admin backend the option

'Delete special characters like dots, commas, exclamation and question marks etc. as part of words'

is activated, also the search query will be cleaned from secondary characters.

Consequently queries like 'book: kellner' and 'kellner, rolf' will no longer fail.

This modification will not be active for 'Phrase' search.


Improved search function for queries containing hyphens.


Improved HTML files. Now loading faster the search form.


Improved display output for main categories.


Improved 'addurl' form. Now accepting URL's without www.


Improved 'addurl' form. If categories are available, as per default the new suggested site will be placed in category 'none'.


Common word list added for Chinese language. With thanks to Jame Sian 孙 春淦


Updated framework for ID3 and EXIF extraction during media indexing.


Updated GeoIP database, used to provide the IP of the search user.


Updated IDS configuration file, default filter and converter.


Updated language files for Czech and Slovenian language. Thanks to Peter Krupa.


Updated suffix list, holding all the file suffixes, which will not to be indexed.


Bug fixed in Database backup script that prevented correct storage of index-date.


Bug fixed in suggest framework to enable suggestions for queries using main-categories.


Bugs fixed, which prevented disabling the IDS framework for 'Search User' and 'Suggest User'.


Bug fixed in option "Ignoring parts of a page by <;div id='abc'>;" for multiple nested divs.



Involved files that have been modified / added for this release:

.../addurl.php

.../search_ini.php

.../admin/admin.php

.../admin/admin_header.php

.../admin/admin_search.php

.../admin/auto_index.php

.../admin/db_common.php

.../admin/configset.php

.../admin/index_media.php

.../admin/messages.php

.../admin/spider.php

.../admin/spiderfuncs.php

.../admin/url_backup.php

.../admin/getid3/all files

.../include/commonfuncs.php

.../include/search_10.php

.../include/search_40.php

.../include/search_media.php

.../include/searchfuncs.php

.../include/suggest.php

.../include/common/common_cn.txt

.../include/common/suffix.txt

.../languages/all files

.../templates/010_html-header.html

.../templates/011_html-header.html

.../templates/html/020_search-form.html

.../templates/html/040_category_tree.html

.../templates/html/060_text_results.html

.../templates/html/110_media-only header.html

.../templates/html/200_no media found.html

.../templates/html/sphider-plus.ico


Top


Version: 2.6b

Release date: March 25, 2011

Build up with Sphider: v.1.3.5

Debugged version of v.2.6

Build up with Sphider: v.1.3.5


In front of version 2.6 the following modifications have been added:

New Admin setting:

Protect the .../admin/ folder by means of a .htaccess file.

If activated, and if the .htaccess file is not yet available, the script will automatically detect the IP of the admin and create your .htaccess file in the ../admin/ folder.

If the setting is deactivated (checkbox), the .htaccess file will be deleted by the script, so that afterwards the admin folder is freed again for IP independent access.

New feature:

Result listings 'By URL names' and 'Like Google' are sorted in alphabetic order now.

New feature:

The words specified in common list (to be ignored during index procedure) are no longer interpreted case sensitive. Consequently words like 'Sphider' and 'sphider' need not to be included both.

Improved calculation of keyword weighting. Now working independent from lower case and/or upper case written text.

Bug fixed that prevented renaming of the default search script.

Bug fixed in multithreaded indexing.

Bug fixed to prevent creation of duplicate sub folders in .../admin/

Media search enabled for multiple database support.

User debug mode enabled for link search.

Indexing of https:// sites enabled.

Bug fixed for applications not using the advanced search options.

Bug fixed for embedded application.

Bug fixed for result sorting (By URL names).

Bug fixed in 'More results from URL'.

Bug fixed for 'Usese commonlist for words to be ignored during index / re-index'.

Bug fixed in 'Use blacklist to prevent index of pages'.


Involved files that have been modified / added for this debug release:

.../search.php

.../search_ini.php

.../admin/admin.php

.../admin/configset.php

.../admin/spider.php

.../admin/spiderfuncs.php

.../admin/url_backup.php

.../admin/settings/backup/Sphider-plus_default-configuration.php

.../admin/thumbs/.htaccess

.../include/categoryfuncs.php

.../include/commonfuncs.php

.../include/search_10.php

.../include/search_40.php

.../include/searchfuncs.php

.../include/search_links.php

.../include/search_media.php

.../templates/html/ all files


Version: 2.6

Release date: March 08, 2011

Build up with Sphider: v.1.3.5

In front of Sphider-plus version 2.5 the following items have been added / modified:


New feature:

Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in .../xml/

For details see the documentation chapter: XML result output


New feature:

Index only parts of a page by <;div id='abc'>;

If enabled in Admin settings, the values as defined in the list-file .../include/common/divs_use.txt

will be used to index only the content between <;div id='abc'>; and <;/div>; .

This is the contray function to: Ignoring parts of a page by <;div id='abc'>;

which is controlled by the list file .../include/common/divs_not.txt

For details see the documentation chapter: Indexing only parts of a page by <;div id='abc'>;


New feature:

Individual (Admin) settings for each database and each set of tables.

Automatically activated by selecting any of the 5 databases and any set of tables in the db’s.


New feature in Admin backend:

'Search' functions are available now in order to query for:

- sites

- links

- keywords

- categories


New Admin setting:

Define number of sites shown per page in Admin backend (pagination 10, 20, 30, 50, 100).

Used for:

- Sites view

- Approve URLs

- Banned domains

- Statistic results


Improved Admin settings:

The table in Admin backend 'Sites' view could be sorted:

- by index-date, latest on top

- by index-date, oldest on top

- by title as personally defined when adding the site

- in alphabetic order (URL)


New feature:

Additional option to Re-index only the sites that are currently shown in 'Sites' view.

By selecting (pagination) 10, 20, 30, 50 or 100 sites per page, it is possible

to re-index only the URLs presented on page 1, and later on those of page 2 etc.


New Admin setting:

Obligatory use the preferred charset as defined in 'General Settings' for indexing.

The corresponding option is to be found in sites 'Edit' option, so that individual sites

could be influenced. If activated, the header information like

<;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />;

of the site to be indexed, will be overwritten by the preferred charset.


New Admin setting:

Separated activation of debug mode for Admin backend and User interface.


New Admin setting:

Do not index the full text. If activated, only the page 'Title', the 'Keywords' Meta tag,
as well as the 'Description' Meta tag will be indexed.
Never the less, links found in full text will be followed.


New feature:

Queries containing ' && ' will overwrite the advanced search settings to AND.

Queries containing ' || ' will overwrite the advanced search settings to OR.


Complete redesign of all search files for easier integration of Sphider-plus scripts into an existing HTML site.

With special thanks for the suggestions, ideas and the participation of Carl Erling

http://www.tba-berlin.de


New Admin setting:

Define whether the 'Search form' and the 'Result listing' of Sphider-plus is embedded into an existing page HTML layout and design, or whether it is used as an independent page.

For details see the documentation chapter: Integration of Sphider-plus into existing sites


New Admin setting:

Define the name of the search script in root folder of Sphider-plus (default: search.php).


Separated style sheet files are now included for Admin backend and for the User interface. This enables to individualize the User style sheet without destroying the Admin design.

For details see the documentation chapter: Integration of Sphider-plus into existing sites


Improved 'Did you mean' algorithm. Now searching for a wider range of potential keywords.


Break character inside of words will now be ignored, so that the complete word becomes indexed and searchable.


Output of Intrusion Detection System now is presented with respect to the currently activated template design.


Improved backup for Admin 'Settings'. The name of the backup file will now consist of:

- Date and time

- Number of database

- Name of table prefix

Consequently, all details for the allocation of the backup files are available now.


Automatically add "http://" for new sites in Admin backend.


Bug fixed, which prevented limiting of search results. Occurred, if multiple databases were selected to deliver search results.

Bug fixed that created multiple wildcards, if searching for numbers.

Bug fixed that suppressed the HTML header in link search.

Bug fixed, which has overwritten the Admin setting

"Show x results per page in result listing"

caused in .../include/searchfuncs.php by the row

if ($all_wild && $greek != '1') $max_hits = '99';

Bug fixed to prevent a blank display on first opening the Admin backend (if mb_string functions are not available).

Bug fixed, which causes invalid URL parsing for relative links with ../../ indication.

Bug fixed that prevented domain search for localhost applications

Bug fixed to prevent invalid character size for ‘Like Google’ result listing

Bug fixed in database 'Backup & Restore' function.

Some additional small bugs removed.


Involved files that have been modified / added for this release:

.../addurl.php

.../search.php

.../search_ini.php

.../admin/admin.php

.../admin/admin_header.php

.../admin/admin_footer.php

.../admin/auth.php

.../admin/configset.php

.../admin/db_main.php

.../admin/GeoIP.dat

.../admin/install_tables.php

.../admin/messages.php

.../admin/real_get.php

.../admin/real_log.php

.../admin/settings/backup/all files

.../admin/setting_backup.php

.../admin/spider.php

.../admin/spiderfuncs.php

.../admin//url_backup.php

.../include/commonfuncs.php

.../include/media_counter.php

.../include/search_10.php

.../include/search_20.php

.../include/search_30.php

.../include/search_40.php

.../include/search_50.php

.../include/search_links.php

.../include/search_media.php

.../include/searchfuncs.php

.../include/show_id3.php

.../include/suggest.php

.../include/common/audio.txt

.../include/common/divs.txt

.../include/IDS/Config/Config.ini.php

.../settings/all files and folders

.../templates/html/all files

.../templates/Pure/adminstyle.css

.../templates/Pure/userstyle.css

.../templates/Slade/adminstyle.css

.../templates/Slade/userstyle.css

.../templates/Sphider-plus/adminstyle.css

.../templates/Sphider-plus/userstyle.css


Attention: This version requires an updated set of database tables. In order to create the new set of tables, run the 'Install all tables' for all databases in 'Database Management / Configure' menu.


Top


Version: 2.5

Release date: November 30, 2010

Build up with Sphider: v.1.3.5


In front of Sphider-plus version 2.4 the following items have been added / modified:


New feature:

Bound database.

This option will delete all keyword relationships, exceeding a definable amount of query results.

Beside the result cache, this option will significantly speed up the search procedure for huge databases.

For details see chapter: Bound database


New feature:

In order to get indexed, user suggested sites optionally need a meta tag in header.

Defined by the Sphider-plus admin during approval, the tag values could be used to verify the

ownership of the suggested URLs, offer commercial dependencies, or perform a membership verification.

For details see chapter: User suggested sites


New feature:

Intrusion Detection System (IDS) included to protect Sphider-plus against hacking attempts.

It includes extensive regex rules to tags like

XSS, SQLI, RCE, LFI, DT, CSRF, LDAP Injections, and DoS.

Admin selectable, the IDS will block further user input, create a log-file, present a warning message, or even block any traffic of IP’s known to be evil.

For details see chapter: Intrusion Detection System (IDS)


New feature

Index only links and their link text.

If activated in Admin settings, full text and media content will not be indexed, but only the link text (titles) of all links. Will also work for image links and their 'title' and 'alt' tags:

title="this text", alternatively alt="this text".

Result listing presents the (active) links with respect to the page at which they were found. If searching for a link text, the different search modes are available.


New feature in Admin settings:

Add new domains found during index procedure to 'Approve Sites' table.

To be activated in section ‘General Settings’, this option is available for those sites, having activated ‘Spider can leave domain’ in their options.


New feature in Admin sites menu:

Index all the suspended.

Will continue the index procedure for all the sites that are marked as 'Unfinished'.


New feature:

Index media content with respect to frame/iframe position.

To be activated in Admin settings, this option allows to index media links, which are addressed as links relative to the frame/iframe position (folder).


Improved URL import/export function:

Now all options of each site will be stored in backup file and re-imported.


New Admin setting:

Clean query log during index / re-index and also for all erase functions.

Will reset all 'Search' statistics in Admin backend, as well as the

'Most popular search' table at the bottom of result listing


New feature in Admin backend:

When opening the Admin interface, a warning message will be presented about new suggested sites, waiting for approval. Working independent from the currently activated databases, so that suggestions of any databases will create an alert.


Dynamically created description tag in result page header.

Build up with the titles of the most important results, presented on the different result pages.


Some small bugs eliminated.



Involved files that have been modified / added for this release:

.../addurl.php

.../search.php

.../admin/admin.php

.../admin/admin_footer.php

.../admin/auth.php

.../admin/configset.php

.../admin/index_media.php

.../admin/install_tables.php

.../admin/spider.php

.../admin/spiderfuncs.php

.../admin/url_backup.php

.../include/commonfuncs.php

.../include/ids_handler.php

.../include/search_links.php

.../include/searchfuncs.php

.../include/search_media.php

.../include/suggest.php

.../include/swfobject.js

.../include/tagcloud.swf

.../include/IDS/all files and ...

.../languages/all files

.../templates/html/010_html_header.html

.../templates/html/020_html_search-form.html

.../templates/html/021_html_search-form.html

.../templates/html/022_html_search-form.html

.../templates/html/050_result-header.html

.../templates/html/060_text-results.html

.../templates/html/070_more-results.html

.../templates/html/080_most_pop.html

.../templates/html/081_3D_tag_cloud.html

.../templates/html/100_all-media result-header.html

.../templates/Pure/all files

.../templates/Slade/thisstyle.css

.../templates/Sphider-plus/thisstyle.css


Attention: This version requires an updated set of database tables. In order to create the new set of tables, run the 'Install all tables' for all databases in 'Database Management / Configure' menu.


Top

Version v.2.4

Release date: July 03, 2010

Build up with Sphider: v.1.3.5


New feature:

In order to reduce the time for indexing, multithreaded indexing was implemented.

As part of the Admin settings, 1-10 threads are to be activated.

Available also for command line operation without limitation of the thread counts.

For details see the documentation chapter: Multithreaded indexing


New feature:

Segmentation of Japanese phrases. To be activated in Admin settings.

Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems.

Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8


New feature:

Index CVS files. To be activated in Admin settings.

Implemented as PHP script, the converter needs no adoption to the Operating System.


New feature:

In order to index ‘OpenDocument files, the corresponding converter were integrated.

Implemented as PHP scripts, the converter need no adoption to the Operating System.

To be activated separately for spreadsheet files (.ods) and text files (.odt) in Admin settings.


New command line options:

- Erase content of database <;-erase>;

- Set ‘Last indexed’ date and time to 0000 <;-preall>;


Improved support for Japanese coded sites (charset: SHIFT_JIS, EUC-JP and UTF-8).


Template design 'Pure' reduced to 'like Google'.


Improved search algorithm: significantly reduced search time.


Improved support for Greek language:

1. Transliterate queries with Latin characters into their Greek equivalents.

   Will for example transform query input alla to find ἀλλὰ and baptismatos to find βαπτίσματος.

2. Accept Greek queries containing vowels without accents.

   Query input of letter α will be valid also for

   ἀ, ἁ, ἂ, ἃ, ἄ, ἅ, ἆ, ἇ, ὰ, ά, ά, ᾀ, ᾁ, ᾂ, ᾃ, ᾄ, ᾅ, ᾆ, ᾇ, ᾰ, ᾱ, ᾲ, ᾳ, ᾴ, ᾶ and ᾷ

   The same behavior for all other Greek vowels, as well as for the upper case vowels.

   Both options will create a tolerant result listing.


New options in 'Add site' and 'Edit site' menus:

- Enter URL of individual Sitemap

If Sitemap is not in root folder, the URL of the individual Sitemap could be entered.


New option to manipulate the result listing:

For result sorting 'By URL names' the number of results shown per domain is selectable.

Offers result presentation similar to 'Like Google', but additionally offers a selectable count of links.


Search option 'More results from this domain' not only enabled for result sorting 'Like Google', but also for 'By URL names'.


Bug fixed that prevented correct interpretation of http 301 redirects.


Bug fixed that causes invalid results for multiple word queries, which contain numbers, like 'price 25 euro'.


Bug fixed that prevented highlighting of keywords, if found in position 0 of title or full text.


Bug fixed that prevented suppressing of 'Show result scores' in Admin settings.


Bug fixed that prevented to follow the option 'temporary ignore no-index'.


Additional language file for Japanese. Thanks to Sano Tomonori.


Updated Arabic language file. Thanks to Naif Alanazi.


Updated Italian language file. Thanks to Giorgio Nanni.



Involved files that have been modified / added for this release:

.../search.php

.../admin/ all files

.../converter/ods_reader.php

.../converter/odt_reader.php

.../converter/dictionaries/jp_shiftJIS.dic

.../converter/OpenDocumentSheet/ all files

.../include/commonfuncs.php

.../include/searchfuncs.php

.../include/search_media.php

.../include/suggest.php

.../languages/ all files

.../templates/html/all files

.../templates/Pure/thisstyle.css

.../templates/Slade/thisstyle.css

.../templates/Sphider-plus/thisstyle.css


Top

Version v.2.3

Release date: April 23, 2010

Build up with Sphider: v.1.3.5


In order to ease customer's integration of Sphider-plus into existing sites, HTML templates are prepared for

- Search form

- Text results

- Media results

- Most popular queries

- etc.


New feature:

Allow indexing of other hosts with same domain name for links found during indexing. Also ignore TLD, SLD and www.

More details in documentation chapter: Allow other hosts in same domain


New feature:

Allow indexing of other hosts with same domain name but only if the found links are redirected. Also ignore TLD, SLD and www.

More details in documentation chapter: Allow other hosts in same domain


New feature:

Index sites and follow links containing none ‘Basic Latin’ and none ASCII characters as part of their URL.


2 new features of sorting the result listing:

- Results of a promoted / featured domain will be displayed on top of the search result listing.

   As part of the Admin settings, a domain name or part of the name could be entered.

   All search results belonging to this domain will be placed on top of result listing.

- Pages containing a catchword will be displayed on top of the search result listing.

   As part of the Admin settings, the catchword could be entered.

More details in documentation chapter: Chronological order for result listing


New feature:

Split words into their basic parts, separated at each hyphen, dot or comma inside the words.

For example 'sphider-plus.eu' will be divided into the 3 keywords: sphider plus eu

As also the original word is stored as keyword, all 4 words become searchable.

Alternatively the separation only at hyphens is selectable in Admin settings.


New feature:

Index the "Description" Meta tag in HTML header. To be activated in Admin settings.


New feature:

Index of media files enabled for those servers that do not offer all PHP functions for remote files.

Bypassed PHP functions are: fopen(); file_get_contents(); md5_file();


3 new features for command line operation:

- Erase & Re-index all sites ( -eall )

- Index all new URLs in database which had not jet been indexed ( -new )

- Re-index all meanwhile erased sites ( -erased )


New feature:

In order to index XLS files, a converter for Exel files was developed. Implemented as PHP script,

the converter needs no adoption to the Operating System.


New Admin setting:

Index RAR compressed files and archives.

Supports (X)HTML, XML and also compressed PDFs and other document files, as well as all kind of feeds,

frames and iframes. Links found in the compressed files will be followed.


15 language specific stemming algorithms implemented. Individually selectable for:

Bulgarian, Chinese, Czech, Dutch, English, Finnish, French, German,

Greek, Hungarian, Italian, Portuguese, Russian, Spanish and Swedish.

For details see chapter Word stemming

More details in documentation chapter: Word stemming


New Admin setting:

Activate/disable: Create 'sitemap.xml' file of each indexed site.


New Site option in Admin menu:

Erase/clean site-specific data from MySQL database and thumbnails folder for a selected site.


New Admin setting:

Re-index all meanwhile erased sites.


New Admin setting:

Show complete list during import and export of URLs, or hide output.


24 language specific common files holding a list of words to be ignored during index (stop words).

Added or updated for:

Arabic, Bengali, Bulgarian, Catalan, Czech, Danish, Dutch, English,

Farsi, Finnish, French, Greek, German, Hindi, Hungarian, Italian, Norwegian,

Polish, Portuguese, Romanian, Russian, Spanish, Swedish and Turkish.

In order to speed up index procedure, they are to be activated individually in Admin settings.


New feature:

Self test for all required PHP libraries and extensions. If Debug mode is enabled,

the corresponding warning messages will be presented on top of the Settings menu.


Improved database 'Activate / Disable' menu:

If multiple sets of tables are available, because they have been created for a database before,
you will be able to activate any of these table sets by selecting the corresponding prefix.


New Admin setting:

Define directory for templates (relative to root directory of Sphider-plus)


Search and open media files enabled now for media links with up to 1024 characters.


Input settings for database configuration menus are now enabled for values up to 255 characters.


'Clean resources' improved for index procedure.

In case of failure, only warning messages will be created and indexing will not be aborted.


The feature 'Clean resources' is added now also for search procedure.

Common activation in Admin settings for search and index procedure.


If debug mode is enabled, during index procedure, the new keywords are presented in alphabetic order now.


Follow ‘robots.txt’ directive enabled also for localhost applications.


Bug fixed that causes the result listing to be presented only in lower case characters.

Now presented like the original title and full text of the indexed pages.


Some more small bugs eliminated.


Updated Admin dialog. Thanks to Ian Bucklar.


Additional language file for Hebrew. Thanks to Noam Bercovitz.


Updated Russian languge file. Thanks to Uttkirbek Abdullaev.


Updated Romanian language file. Thanks to Lionel Geo Mischie.



Involved files that have been modified / added for this release:

.htaccess

search.php

.../admin/admin.php

.../admin/configset.php

.../admin/db_config.php

.../admin/index_media.php

.../admin/install_tables.php

.../admin/messages.php

.../admin/real-log.php

.../admin/spider.php

.../admin/spiderfuncs.php

.../admin/url_backup.php

.../admin/getid3/module.audio-video.asf.php

.../converter/catdoc.script

.../converter/xls2csv.exe (not required any longer)

.../include/commonfuncs.php

.../include/searchfuncs.php

.../include/search_media.php

.../include/common/all common_xyz.txt files

.../include/stemming/all files

.../language/he-language.php

.../languages/ro-language.php

.../languages/ru-language.php

.../settings/conf.php

.../templates/all folder/thisstyle.css

.../templates/html/all files


Top

Version v.2.2

Release date: December 22, 2009

Build up with Sphider: v.1.3.5

In front of Sphider-plus version 2.1 the following items have been added / modified:


Improved multiple database support:

Results may now be collected from more than one database.

1 - 5 databases could be configured to fetch results for the common result listing.

Valid for text and media search, all search modes, taking into account category selection.

More details in documentation chapter Activate / Disable databases


Improved RSS and Atom feed index procedure. Including now also a validation for the well-formed XML.


Support added for RDF feeds.

For a complete list of indexed items, please notice the documentation chapter:

RDF, RSD, RSS and Atom feeds


Additional item in Admin settings:

Follow CDATA directives for feed content.


Additional item in Admin settings:

Index 'Dublin Core' and other individually marked tags in RDF feeds.


Additional item in Admin settings:

Follow the 'preferred (true/false)' directive in RSD feeds.


Detection of encoding (charset) added for XML and XHTML files.


New item in Admin settings:

During index procedure, convert all kind of single quotes like   ` ´ ’ ‘ into standard quotes  '


New item in Admin settings:

Reduce queries which contain quotes to the basic word.

This will deliver the same results for queries like:

d'information = information   or   dei'largi = largi

Results will be highlighted for the base word. Exclusive noun, pronoun, etc.

Works for all kinds of single quotes.


New Admin setting:

For queries containing numbers, search with wildcards.

Useful to search for complex article numbers, if the user only knows a part of the complete item description


New Admin setting:

Index ZIP compressed files and archives.

Supports (X)HTML, XML and also compressed PDFs and other document files,

as well as all kind of feeds, frames and iframes. Links found in the compressed

files will be followed.


New option to sort the result listing:

Sort by last indexed (date and time). To be defined in Admin settings.


New option to limit result listing:

Define max. amount of results presented in result listing.

To be defined in Admin settings, the count of results will be limited for text and media results


New item in Admin settings:

Use list of div id's to ignore the corresponding div content during index/re-index

A common list of div id values is used to ignore parts of a page.

Content between <;div id=’this_value’>; and <;/div>; will be ignored,

however links in it are followed. Multiple and nested div’s will be attended.

Values in common list may end with a wildcard, so that 'menu*' will work for

menu1, menu2, menu_left, etc.

Usable also for external pages, if it is impossible to add the <;!--sphider_noindex-->; tags.


Common 'URL Must include' and 'URL must Not include' rules, which are valid for all new sites, may be placed now in 2 files. The contents will be transferred to the corresponding option fields when calling 'Add site' in Admin menu. Individually de-selectable by checkbox.

Details in documentation chapter: Must include / must not include string list


Log output suppressed, if the indexer is only redirected from

http://www.abc.de to http://www.abc.de/index.html


Improved response for 'canonical' links. Back references to the calling page are ignored now.


New option for iframe indexing in Admin settings:

Instead of calling page, remember the link to iframe directly.


New Admin setting:

If found on different pages, index also duplicate media content.

If activated, all images, audio and video stream will be presented in result listing.

Otherwise only the first occurrence (page/link) will be presented.


Index procedure improved for dynamical created links.


New option:

Suppress zero results in 'Most popular searches' as presented at the bottom of result listing.

To be selected in Admin settings.


Self test whether all ...s of ../admin/ are writeable. Otherwise a chmod 777 is performed. Malfunction will cause warning messages.


Self test for up to date table structure of MySQL database.


Self test of PDF converter for correct addressing the converter and correct conversion of a test-file. Failures and malfunction will cause warning messages.


Updated PDF converter for non-Latin text like Arabic, Cyrillic, Chinese, Greece and Hebrew documents

With special thanks for the assistance of Daniel Richard, cnmss.fr


New algorithm to create the CAPTCHA in 'Add URL' form.


Function renamed from replace() to replace_string() in .../commonfuncs.php


Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters

(as to be defined in Admin settings: 'Maximum length of page summary displayed in search results').


Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found in position 0 of full text.


Bug fixed that caused direct jump to iframe instead of linking to the calling page, when activating the link in result listing.


Bug fixed that prevented to display long URLs (> 70 characters) in Admin sites view.


Updated Dutch language file. Thanks to Danny von Berg.



Involved files that have been modified / added for this release:

addurl.php

search.php

.../admin/admin.php

.../admin/admin_header.php

.../admin/auth.php

.../admin/auth_db.php

.../admin/configset.php

.../admin/db_config.php

.../admin/index_media.php

.../admin/install_tables.php

.../admin/messages.php

.../admin/spider.php

.../admin/spiderfuncs.php

.../converter/dummy.pdf

.../converter/feed_parser.php

.../converter/pdftotext.exe

.../converter/pdftotext.script

.../converter/xpdfrc

.../include/click_counter.php

.../include/commonfuncs.php

.../include/make_captcha.php

.../include/media_counter.php

.../include/searchfuncs.php

.../include/search_media.php

.../include/common/must_include.txt

.../include/common/must_not_include.txt

.../include/common/not_div.txt

.../include/images/no_fonts.jpg

.../languages/ all files

.../templates/all folders/hdline.jpg

.../templates/all folders/thisstyle.css

.../converter/rss2html.php + rss.html + rss_parser.php => no longer required


Attention: This version requires an updated set of tables in the MySQL database. In order to create the new tables, please run the 'Install all tables' for all databases in 'Database Management / Configure' menu.


Top

Version v.2.1

Release date: September 03, 2009

In front of Sphider-plus version 2.0 the following items have been added / modified:

 

New item in Admin settings:

Perform a segmentation of Chinese and Korean text during index / re-index procedure.

Will divide phrases like 帽子和服装 into the base words 帽子 and 和 and 服装 ,

so that all will become searchable.

Valid for Chinese sites with charset: GB2312, GBK and GB18030

Valid for Korean sites with charset: EUC-KR and ISO10646-1933

 

New item in Admin setting:

Index password protected sites.

If enabled, Sphider-plus will index also .htacces protected sites (basic authorization).

Up to 3 different zones could be registered in Admin settings and will be indexed.

 

New options in Admin settings:

- Index framesets

- Index iframes

If enabled, both options will index html and image frames.

Not available for dynamically reloaded frames (e.g. by JavaScript).

 

New item in Admin setting:

Enable to decode BBCode during index / re-index into standard HTML

If selected, code like

[url=http://abc.de/][b]abc.de[/b][/url]

will be converted to

<;a href="http://abc.de">;<;strong>abc.de<;/strong>;<;/a>;

 

New item in Admin settings:

Enable to decode entity coded sites into standard HTML characters.

If selected, entity coded text like Čapek and D&#246;hl

will be converted to Čapek and Döhl

 

New options in Admin settings:

- Use whitelist in order to enable index / re-index only those pages

  that include any the words in whitelist

- Use whitelist in order to enable index / re-index only those pages

  that include all the words in whitelist

 

Improved 'Follow sitemap.xml' procedure:

If <;sitemapindex . . >; is detected in a sitemap.xml file, and if multiple Sitemap files are available,

Sphider-plus will process the secondary Sitemaps and extract all links for index / re-index.

Also gzip-compressed files (Index Sitemap files as well as the Sitemap files) will be processed.

 

Improved index / re-index procedure:

If charset of a site to be indexed is undetectable, because it is not HTML standard conform

or missing HTML tag, the index procedure will no longer been interrupted.

Preferred charset as defined in Admin settings will be used for the involved link.

 

Improved index / re-index procedure:

If Sphider-plus is relocated by http 301 or 302, links found at the relocated site

will also be followed.

 

For new sites, as per default the spider-depth is now set to 'full'.

 

Improved UTF-8 support:

Conversion into UTF-8 charset now is obligatory.

 

Improved index and re-index procedures for Cyrillic and Greek languages to support upper and lower case characters.

 

Bug fixed that prevented to continue suspended index procedures.

 

'Continue suspended index procedure' enabled now also for 'Re-index' and 'Erase & Re-index'.

 

Improved search functions for search with wildcards and for strict search.

 

Improved category search:

- Selected category name is highlighted in headline of result listing.

- If activated in Admin setting, categories which would also deliver results

   are presented individual for each result link in the result listing.

- If search in category is performed, sub-categories which would also deliver results

   are presented individual for each result link in the result listing.

 

If media search is enabled in Admin settings, text search with wildcards will also present media results.

 

Improved search utility:

Queries with and without hyphen will deliver the same results,

so that queries like 'make-up' and 'make up' do have equal rights.

The same behaviour is performed for queries containing dots, commas and question marks.

 

Maximum length for site and link URLs to be indexed is now increased to 1024 characters.

 

Maximum length for link 'title' increased to 255 characters.

 

Code rewritten to cooperate with PHP 5.3.x

 

Error corrected de-language file. Thanks to Carl D. Erling

 

 

Involved files that have been modified / added for this release:

Nearly all, because of PHP 5.3 compatibility.

 

In order to enable the two new items:

- For new sites, as per default, the spider-depth is set to 'full'.

- URLs will be accepted for a length of up to 1024 characters.

this release requires the installation of new table sets for each database.


Top

Version v.2.0

Release date: May 27, 2009

In front of Sphider-plus version 1.9 the following items have been added / modified:

 

Multiple database support for up to 5 independent databases (expandable).

Individual activation of one database for:

- Admin

- Search user

- Suggest URL

For more details, please notice chapter Multiple database support

 

Independent configuration and activation for each database is integrated into the Admin interface.

 

Additional password protected access permission for database configuration, independent from Admin login.

 

Integrated availability check for all databases and their release relevant table structure.

 

Individual for each database:

- Backup and restore

- Copy / Move from each database to each other database

 

32 MByte query cache for MySQL database.

- To be activated in Admin settings.

- Status of cache is observable in Admin / Statistics / Server-Info / MySQL.

   (Cache might not work for 'Shared Hosting' applications)

 

Obey the tag specification:

rel="canonical"

If defined in page header of a website, the crawler will be redirected to the canonical link and Sphider-plus will understand that the duplicates all refer to the canonical URL.

For more details, please notice chapter Canonical <;link>; tag

 

Index websites that are created with ASP.NET

 

Definition for path to PDF converter integrated into Admin Settings interface. Additionally the default setting - as required for the Operating System environment - is suggested.

 

If path to PDF converter is invalid and converter is not accessible, an error message (in Admin Settings dialog) is created.

 

Additional Admin setting to enable optionally indexing of external hosted media content.

 

Improved index procedure of media files, by avoiding indexing of duplicate media content.

 

Improved image indexing by reducing the required download time.

 

Improved index / re-index procedure to avoid 'MySQL server has gone away' messages.

 

prototype.js framework adapted to cooperate with XHTML valid parameter handling.

 

XHTML1.0 output for

- Admin interface

- Search form and Result listing

- Suggest URL form

 

Improved vulnerability check of User input and Admin log-in:

- Prevent buffer overflow errors.

- Suppress JavaScript execution and tag inclusions masked as XSS attacks.

- Prevent C-function 'format-string' vulnerability.

 

The 'URL Suggestion Form' now includes a character counter for remaining input in 'title' and 'description' field.

 

Phrase search is enabled now also for title tags, not only for full text.

 

Improved suggest framework:

For search in categories, the suggestions now will be presented with respect to the pre-selected category.

 

For 'Search with wildcards' now the complete word is highlighted in result listing. Not only the query part of the found keyword.

 

Additional Admin setting in section 'Suggest Options':

For 'Media search' get suggestions also from EXIF info and ID3 tags.

 

Files for database setting and script configuration are protected now against direct client access by pre-defining a named constant.

 

Updated Swedish language file. Thanks to Holger Gremminger.

 

Bug fixed in 'Search for suggestions in query log', which prevented to disable this option

 

Bug fixed that caused multiple listing of the same result, when

"Define maximum count of result hits per page, displayed in

search results (if multiple occurrence is available on a page)"

was activated.

 

Involved files that have been modified / added for this release:

Nearly all scripts.

 

Attention: This release requires a fresh installation of all scripts and a blank MySQL database. An update from former Sphider-plus versions or an upgrade from original Sphider is not foreseen.

For more details, please notice the chapter Installation of Sphider-plus version 2.0


Top

The Sphider-plus honeybee