Search Content Source to Website

In this article we can create a Search Content Source to a Website.

What is the Goal?

Our goal is to make the content in following blog searchable from SharePoint 2013.

· https://futurecapsblog.wordpress.com/

image

Please note that above is a Reference web site. You can come up with your own web site with valid robots.txt file.

Steps

Following are the steps involved.

Step 1: Create Content Source

Open Central Administration > Service Applications > Search Service Application > Content Sources

image

Create a New Content Source and Enter the following information.

image

Click OK to save changes.

Step 2: Crawl

Now choose the Full Crawl option for the content source.

image

Wait for a few minutes for the crawling to be completed.

image

SharePoint will be accessing the Home Page through the URL, parsing contents, reading metadata, extracting URLs and digging deeper for more contents & all together forms the indexing.

Step 3: View Log

You can check the Content Source for any Crawl Errors or Warnings that prevent from showing content.

image

You will get the following page.

image

You can click on the links to view the error/warning. Discard the non-serious ones.

Step 4: Search

Open the Enterprise Search Center site & type the following text.

image

You can see the results showing with above blog url. This confirms our Web Content Source configuration.

Challenges

In the real world scenarios things won’t work in this speed. You may encounter the following issues & I can provide some links to resolve them.

You can view these errors from the Content Source > View Crawl Log menu.

Item not crawled due to one of the following reasons: Preventive crawl rule; Specified content source hops/depth exceeded; URL has query string parameter; Required protocol handler not found; Preventive robots directive

Solution 1: If query strings involved in URL go for Crawl Rules > http://bit.ly/1k1sIKt

Solution 2: If source in same system, loop back check > http://support.microsoft.com/kb/896861/en-us

The content for this address was excluded by the crawler because this item was marked with a no-index meta-tag. To index this item, remove the meta-tag and recrawl.

Solution 1: If source is external web site check for robots.txt > http://bit.ly/PomtFg

Solution 2: If source is SharePoint site or library > http://bit.ly/1i99dBs

As a common measure I would recommend applying SharePoint Cumulative Updates & Operating System Service Packs to the machines.

References

http://technet.microsoft.com/en-us/library/jj219808(v=office.15).aspx

Summary

In this article we have explored how to create a Web Content Source in SharePoint 2013.

Leave a Reply

Your email address will not be published. Required fields are marked *