Welcome to MSDN Blogs Sign in | Join | Help

The VB version of the Blog Crawler

This is the VB.Net 2005 version of the Blog Crawler. It’s based on the Foxpro version, but.it uses SQL Server Everywhere so you can deploy it on your mobile device! It crawls a blog and stores all entries into a SQL Server Everywhere table. This includes blog comments and Cascading Style Sheets.

I had to wait to post this blog entry because SQL Everywhere CTP public release is today (announced at Tech Ed)!

 

To run it, you only need to copy a few files from this link (1.6 megabytes) into a directory on your machine and start BlogCrawl.Exe. There is no registration or install of any kind required, except the Net Framework 2.0  (which is installed with Visual Studio 2005, or you can download the runtime). The Source code can be unzipped into the same folder and is here. The program (including SQL /E) is totally isolated to the install folder, except for the My Settings XML file which stores your preferences in your local settings folder. It doesn’t touch your registry or install any other files.

 

When you start the program, the top part shows a grid of already crawled blog posts. The bottom part shows each post in a web control as it looked at the time of download. The links on the page are live. When first starting, there will be no data. If you click the Crawl button, it will start a background thread that scans the blog and downloads any entries that have not been downloaded yet. The status bar shows crawl progress.

 

It takes about 20 minutes to crawl my blog and download my 240 posts.  You can stop and continue the background thread at any time by hitting the same Crawl button. The data is stored as a SQL Mobile database in the same folder in a file called <blogname>.sdf.

 

You can type a search string in the textbox and click the Search button to limit the number of records in the grid to those blogs containing the search string.

 

It’s customized for blogs hosted on http://blogs.msdn.com for parsing out the blog entry publication date and determining what page is a blog post and what is just an intermediate page (like February posts). I haven’t tested it with all the various blog CSS styles, but the source can be modified.

 

The program defaults to crawling my blog, but allows you to switch to other blogs. Click the Blog Options button to crawl your favorite blog.

 

If you change the Followed value for a particular entry to 0, then the next crawl will recrawl that link, perhaps if you want to get the latest comments.

 

It uses the new MySettings feature to persist user settings, such as window position and which blog was last crawled. The new SplitContainer class allows you to move the splitter bar between the grid and the web control and the SplitterDistance is persisted in My.Settings.

 

One of my machines was playing a sound while my web crawler was crawling. The culprit was Control Panel->Sounds->Sound->Windows Explorer->Information Bar.

 

 

See also

SQL Moblie books online

Use Regular Expressions to get hyperlinks in blogs

 

 

Published Monday, June 12, 2006 5:25 AM by Calvin_Hsia
Filed under: ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

# re: The VB version of the Blog Crawler

Tuesday, June 13, 2006 3:05 PM by Alan Stevens
Calvin,

I've been talking to Steve Lasker and other members of the SQL Everywhere team at TechEd.  They mentioned that they do not support ODBC.  What are the implications for using existing SQL Pass-Through code against SQL Everywhere?

++Alan

# re: The VB version of the Blog Crawler

Thursday, June 15, 2006 11:54 AM by Michael S. Kaplan
Cool app, Calvin!

I noticed mine takes a bit longer than 20 minutes, should I file a  bug somewhere? :-)

# Customizing the Blog Crawler for different formats

Thursday, June 15, 2006 3:04 PM by Calvin Hsia's WebLog
I’ve had several requests that require customizing the Blog Crawler.
&amp;nbsp;
The entire source code...

# Calvin Hsia, Tech Lead from the Fox Pro team, posts a SQL Server Everywhere Blog sample

Tuesday, June 20, 2006 7:45 PM by Steve Lasker's Web Log
The best validation I've seen for SQL Server Everywhere is when Calvin Hsia, Technical Lead from Fox...

# Why Not SQL Express

Thursday, July 06, 2006 4:13 PM by noahc
Why not use SQL express?  :)

# Unacceptable URL?

Thursday, July 06, 2006 4:29 PM by noahc
It wouldn't crawl this site.  It kept cutting off the /coad portion.  Wuzzup?
http://msmvps.com/blogs/coad

# All Fixed Now!

Monday, July 10, 2006 10:47 PM by noahc
Hey Calvin, thanks for all the fixes you made to support http://msmvps.com/blogs/coad!  This is a very handy tool and something I've been wishing for a long time.  Again, thank you!

# Blog Backup Tool - Blog Crawler

Monday, July 10, 2006 10:53 PM by Noah Coad's Code
I&amp;rsquo;ve been looking for awhile for a way to back up my blog by capturing each post in a nice, MHTML...

# Blog Crawler! And Backup Tool

Tuesday, July 11, 2006 2:29 AM by ѕcrарраd
Check out this awesome little utility I found. Its not something I would use all the time, but I you...

# Calvin's got another cool utility

Tuesday, July 11, 2006 8:36 PM by yag: Community and Architecture
Calvin has written a blog crawler with both VFP and VB.NET versions that allows you to back up your own...

# Use a different kind of grid in your applications

Tuesday, July 18, 2006 9:34 PM by Calvin Hsia's WebLog
My prior post (Create a .Net UserControl that calls a web service that acts as an ActiveX control to...

# Use a different kind of grid in your applications &raquo; Wagalulu - Microsoft &raquo; &raquo; Use a different kind of grid in your applications

# Business Analyst - eCommerce - Eaton - Eden Prairie, MN | Business Source

# The Web&#8217;s Best Interface Design | Business Source

Wednesday, June 20, 2007 3:45 AM by The Web’s Best Interface Design | Business Source

# Create your own web browser on your SmartPhone

Monday, October 01, 2007 4:42 PM by Calvin Hsia's WebLog

Windows Mobile 5.0 comes with a Web Browser (v6 is due out any day now). It runs on Pocket PCs and SmartPhones.

# Persist user form size and location settings per session

Thursday, April 03, 2008 12:36 PM by Calvin Hsia's WebLog

My prior post ( Create your own Test Host using XAML to run your unit tests ) shows how to create a form

# How to interrupt your code

Thursday, May 15, 2008 10:44 PM by Calvin Hsia's WebLog

I received a question: Simply, is there a way of interrupting a vfp sql query once it has started short

Leave a Comment

(required) 
required 
(required) 
 
Page view tracker