In this article we can create a Search Content Source to a Website.
What is the Goal?
Our goal is to make the content in following blog searchable from SharePoint 2013.
Please note that above is a Reference web site. You can come up with your own web site with valid robots.txt file.
Following are the steps involved.
Step 1: Create Content Source
Open Central Administration > Service Applications > Search Service Application > Content Sources
Create a New Content Source and Enter the following information.
Click OK to save changes.
Step 2: Crawl
Now choose the Full Crawl option for the content source.
Wait for a few minutes for the crawling to be completed.
SharePoint will be accessing the Home Page through the URL, parsing contents, reading metadata, extracting URLs and digging deeper for more contents & all together forms the indexing.
Step 3: View Log
You can check the Content Source for any Crawl Errors or Warnings that prevent from showing content.
You will get the following page.
You can click on the links to view the error/warning. Discard the non-serious ones.
Step 4: Search
Open the Enterprise Search Center site & type the following text.
You can see the results showing with above blog url. This confirms our Web Content Source configuration.
In the real world scenarios things won’t work in this speed. You may encounter the following issues & I can provide some links to resolve them.
You can view these errors from the Content Source > View Crawl Log menu.
Item not crawled due to one of the following reasons: Preventive crawl rule; Specified content source hops/depth exceeded; URL has query string parameter; Required protocol handler not found; Preventive robots directive
Solution 1: If query strings involved in URL go for Crawl Rules > http://bit.ly/1k1sIKt
Solution 2: If source in same system, loop back check > http://support.microsoft.com/kb/896861/en-us
The content for this address was excluded by the crawler because this item was marked with a no-index meta-tag. To index this item, remove the meta-tag and recrawl.
Solution 1: If source is external web site check for robots.txt > http://bit.ly/PomtFg
Solution 2: If source is SharePoint site or library > http://bit.ly/1i99dBs
As a common measure I would recommend applying SharePoint Cumulative Updates & Operating System Service Packs to the machines.
In this article we have explored how to create a Web Content Source in SharePoint 2013.