Search for life partner-A Software Architect experiences-Part-1

Searching for ideal life partner is very critical. I gone though this process and figure out that women’s are like an onion-Layers[chelka(Cover)] then Layers and at the end you are left with nothing in hand….:)

I searched few match making services online. There are many in competition. But the best and free service is http://www.shaadionline.tv by Geo.
I tried to register by following online registration process. But didn’t got any confirmation even after weeks. I sent them emails about all but in vain.

So I encountered two problems.
1. Search provided by them don’t let you search on basis of good habit, bad habits and profile description. Even what ever was available was scattered in 4 types. So in general it was a very non user friendly search.
[Reference: http://www.shaadionline.tv/search.asp%5D
2. Secondly I needed some account to login to their systems.

In order to solve the first problem, I decided to perform data scraping on their webs site.
So basic strategy is that we will traverse the profile page in sequence, down load the page and split page content to extract the meaning full data and then save data into database table.
Follow steps as:
1. After analyzing data structure of candidate profile, i create a basic profiles table in a new database as:

CREATE TABLE [dbo].[Profiles](
	[ID] [int] IDENTITY(1,1) NOT NULL,
	[GID] [int] NULL,
	[Name] [varchar](500) NULL,
	[ProfileID] [varchar](500) NULL,
	[PostedBy] [varchar](500) NULL,
	[ProfileCreated] [varchar](500) NULL,
	[Sex] [varchar](50) NULL,
	[Age] [int] NULL,
	[MarrStatus] [varchar](500) NULL,
	[Religion] [varchar](500) NULL,
	[Language] [varchar](500) NULL,
	[City] [varchar](500) NULL,
	[Country] [varchar](500) NULL,
	[BornIn] [varchar](500) NULL,
	[Citizen] [varchar](500) NULL,
	[Height] [varchar](500) NULL,
	[Waist] [varchar](500) NULL,
	[Complexion] [varchar](500) NULL,
	[Looks] [varchar](500) NULL,
	[Disability] [varchar](500) NULL,
	[Star] [varchar](500) NULL,
	[Sect] [varchar](500) NULL,
	[Caste] [varchar](500) NULL,
	[DependentMembers] [int] NULL,
	[PreferredSetup] [varchar](500) NULL,
	[ReligiousLevel] [varchar](500) NULL,
	[FirstChoiceofFood] [varchar](500) NULL,
	[TVProgram] [varchar](500) NULL,
	[PreferredHoneymoonPlace] [varchar](500) NULL,
	[Education] [varchar](500) NULL,
	[Occupation] [varchar](500) NULL,
	[MonthlyIncome] [varchar](500) NULL,
	[EducationDetails] [varchar](500) NULL,
	[MoreAboutMe] [varchar](500) NULL,
	[MyHabits] [varchar](500) NULL,
	[MyBadHabits] [varchar](500) NULL,
 CONSTRAINT [PK_Profiles] PRIMARY KEY CLUSTERED 
(
	[ID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

SET ANSI_PADDING OFF
GO

2. Create a new WPF project in Visual studio , add linq 2 sql project item, add the table created above into linq 2 sql designer and compile the project once.

3. Analyze the code below.

        /// <summary>
        /// Returns the content of a given web adress as string.
        /// </summary>
        /// <param name="Url">URL of the webpage</param>
        /// <returns>Website content</returns>
        public string DownloadWebPage(string Url)
        {
            // Open a connection
            HttpWebRequest WebRequestObject = (HttpWebRequest)HttpWebRequest.Create(Url);
 
            // You can also specify additional header values like 
            // the user agent or the referer:
            WebRequestObject.UserAgent = ".NET Framework/2.0";
      
            WebRequestObject.Referer = "http://www.example.com/";
 
            // Request response:
            WebResponse Response = WebRequestObject.GetResponse();
 
            // Open data stream:
            Stream WebStream = Response.GetResponseStream();
 
            // Create reader object:
            StreamReader Reader = new StreamReader(WebStream);
 
            // Read the entire stream content:
            string PageContent = Reader.ReadToEnd();
 
            
            // Cleanup
            Reader.Close();
            WebStream.Close();
            Response.Close();
 
            return PageContent;
        }
 
        /// <summary>
        /// this function perfoem data scrapping on shaaadionline.tv profile page.
        /// </summary>
        /// <param name="pageContent">Content of profile page as string</param>
        /// <param name="i"> id key used by shaadionline.tv profile page</param>
        public void ProcessWebPage( string pageContent , int i)
        {
            try
            {
                DataClassesShaadiOnlineDataContext dbx = new DataClassesShaadiOnlineDataContext();
                Profile pro = new Profile();
 
                pro.GID = i;
 
                string NextPart0 = pageContent.Substring(pageContent.IndexOf("Details of"));
                pro.Name = NextPart0.Substring(36, NextPart0.IndexOf("</b>") - NextPart0.IndexOf("<b>") - 3);
 
                string NextPart00 = pageContent.Substring(pageContent.IndexOf("(Profile ID:"));
                pro.ProfileID = NextPart00.Substring(13, NextPart00.IndexOf(")") - NextPart00.IndexOf("sol"));
 
                // Posted by
                string NextPart = pageContent.Substring(pageContent.IndexOf("Posted"));
                pro.PostedBy = NextPart.Substring(NextPart.IndexOf("<b>") + 3, NextPart.IndexOf("</b>") - NextPart.IndexOf("<b>") - 3);
 
                //ProfileCreated
                NextPart = pageContent.Substring(pageContent.IndexOf("Created on:"));
                pro.ProfileCreated = NextPart.Substring(NextPart.IndexOf("<b>") + 3, NextPart.IndexOf("</b>") - NextPart.IndexOf("<b>") - 3);
 
                //Sex
                string NextPart2 = NextPart.Substring(NextPart.IndexOf("<td class=\"normaltext\" height=\"22\" width=\"97%\"><span class=\"details\">"));
                pro.Sex = NextPart2.Substring(NextPart2.IndexOf("<b>") + 3, NextPart2.IndexOf("</b>") - NextPart2.IndexOf("<b>") - 3);
                if (pro.Sex != "She")
                    return;
 
                //Age
                string NextPart3 = NextPart2.Substring(NextPart2.IndexOf("is"));
                pro.Age = Convert.ToInt32(NextPart3.Substring(NextPart3.IndexOf("<b>") + 3, NextPart3.IndexOf("</b>") - NextPart3.IndexOf("<b>") - 3));
 
                //MarrStatus
                string NextPart4 = NextPart3.Substring(NextPart3.IndexOf(","));
                pro.MarrStatus = NextPart4.Substring(NextPart4.IndexOf("<b>") + 3, NextPart4.IndexOf("</b>") - NextPart4.IndexOf("<b>") - 3);
 
                //Religion
                string NextPart5 = NextPart4.Substring(NextPart4.IndexOf(",", 3));
                pro.Religion = NextPart5.Substring(NextPart5.IndexOf("<b>") + 3, NextPart5.IndexOf("</b>") - NextPart5.IndexOf("<b>") - 3);
 
                //Language
                string NextPart6 = NextPart5.Substring(NextPart5.IndexOf("&nbsp;", 5));
                pro.Language = NextPart6.Substring(NextPart6.IndexOf("<b>") + 3, NextPart6.IndexOf("</b>") - NextPart6.IndexOf("<b>") - 3);
 
                //living in
                string NextPart7 = NextPart6.Substring(NextPart6.IndexOf("living in"));
                pro.City = NextPart7.Substring(NextPart7.IndexOf("<b>") + 3, NextPart7.IndexOf("</b>") - NextPart7.IndexOf("<b>") - 3);
 
                //Country
                string NextPart07 = NextPart7.Substring(NextPart7.IndexOf(","));
                pro.Country = NextPart07.Substring(NextPart07.IndexOf("<b>") + 3, NextPart07.IndexOf("</b>") - NextPart07.IndexOf("<b>") - 3);
 
                //BornIn
                string NextPart8 = NextPart7.Substring(NextPart7.IndexOf("was born in"));
                pro.BornIn = NextPart8.Substring(NextPart8.IndexOf("<b>") + 3, NextPart8.IndexOf("</b>") - NextPart8.IndexOf("<b>") - 3);
 
                //Citizen
                string NextPart9 = NextPart8.Substring(NextPart8.IndexOf("is a citizen of"));
                pro.Citizen = NextPart9.Substring(NextPart9.IndexOf("<b>") + 3, NextPart9.IndexOf("</b>") - NextPart9.IndexOf("<b>") - 3);
 
 
                //Next Age
                string NextPart10 = NextPart9.Substring(NextPart9.IndexOf("<td colspan=\"3\" class=\"details\"><b class=\"text\">Personal \r\n                                    Information</b></td>\r\n                                </tr>\r\n                                <tr valign=\"top\" bgcolor=\"FFF7F8\"> \r\n"));
                pro.Age = Convert.ToInt32(NextPart10.Substring(NextPart10.IndexOf("<td width=\"42%\" class=\"blocktext\">&nbsp;Age</td>\r\n                                  <td width=\"4%\" class=\"blocktext\">:</td>\r\n                                  <td width=\"54%\" class=\"blocktext\">") + "<td width=\"42%\" class=\"blocktext\">&nbsp;Age</td>\r\n                                  <td width=\"4%\" class=\"blocktext\">:</td>\r\n                                  <td width=\"54%\" class=\"blocktext\">".Length, NextPart10.IndexOf("&nbsp;</td>\r\n") - (NextPart10.IndexOf("<td width=\"42%\" class=\"blocktext\">&nbsp;Age</td>\r\n                                  <td width=\"4%\" class=\"blocktext\">:</td>\r\n                                  <td width=\"54%\" class=\"blocktext\">") + "<td width=\"42%\" class=\"blocktext\">&nbsp;Age</td>\r\n                                  <td width=\"4%\" class=\"blocktext\">:</td>\r\n                                  <td width=\"54%\" class=\"blocktext\">".Length)));
 
 
                //Next Height
                string NextPart11 = NextPart10.Substring(NextPart10.IndexOf("<td width=\"42%\" class=\"blocktext\">&nbsp;Height</td>\r\n                                  <td width=\"4%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"54%\" class=\"blocktext\">"));
                pro.Height = NextPart11.Substring(NextPart11.IndexOf("<td width=\"42%\" class=\"blocktext\">&nbsp;Height</td>\r\n                                  <td width=\"4%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"54%\" class=\"blocktext\">") + "<td width=\"42%\" class=\"blocktext\">&nbsp;Height</td>\r\n                                  <td width=\"4%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"54%\" class=\"blocktext\">".Length, NextPart11.IndexOf("</td>\r\n                                </tr>\r\n                                ") - (NextPart11.IndexOf("<td width=\"42%\" class=\"blocktext\">&nbsp;Height</td>\r\n                                  <td width=\"4%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"54%\" class=\"blocktext\">5 \r\n") + "<td width=\"42%\" class=\"blocktext\">&nbsp;Height</td>\r\n                                  <td width=\"4%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"54%\" class=\"blocktext\">".Length));
 
                //Next Waist
                string NextPart12 = NextPart11.Substring(NextPart11.IndexOf("<td width=\"42%\" class=\"blocktext\" bgcolor=\"FFF7F8\">&nbsp;Waist</td>\r\n                                  <td width=\"4%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"54%\" class=\"blocktext\">"));
                pro.Waist = NextPart12.Substring(NextPart12.IndexOf("<td width=\"42%\" class=\"blocktext\" bgcolor=\"FFF7F8\">&nbsp;Waist</td>\r\n                                  <td width=\"4%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"54%\" class=\"blocktext\">") + "<td width=\"42%\" class=\"blocktext\" bgcolor=\"FFF7F8\">&nbsp;Waist</td>\r\n                                  <td width=\"4%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"54%\" class=\"blocktext\">".Length, NextPart12.IndexOf("</td>\r\n                                </tr>\r\n                                ") - (NextPart12.IndexOf("<td width=\"42%\" class=\"blocktext\" bgcolor=\"FFF7F8\">&nbsp;Waist</td>\r\n                                  <td width=\"4%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"54%\" class=\"blocktext\">") + "<td width=\"42%\" class=\"blocktext\" bgcolor=\"FFF7F8\">&nbsp;Waist</td>\r\n                                  <td width=\"4%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"54%\" class=\"blocktext\">".Length));
 
 
                //Next Complexion
                string NextPart13 = NextPart12.Substring(NextPart12.IndexOf("<tr valign=\"top\" bgcolor=\"FFF7F8\"> \r\n                                  <td width=\"42%\" class=\"blocktext\" bgcolor=\"FFF7F8\">&nbsp;Complexion</td>\r\n                                  <td width=\"4%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"54%\" class=\"blocktext\">"));
                pro.Complexion = NextPart13.Substring(NextPart13.IndexOf("<tr valign=\"top\" bgcolor=\"FFF7F8\"> \r\n                                  <td width=\"42%\" class=\"blocktext\" bgcolor=\"FFF7F8\">&nbsp;Complexion</td>\r\n                                  <td width=\"4%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"54%\" class=\"blocktext\">") + "<tr valign=\"top\" bgcolor=\"FFF7F8\"> \r\n                                  <td width=\"42%\" class=\"blocktext\" bgcolor=\"FFF7F8\">&nbsp;Complexion</td>\r\n                                  <td width=\"4%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"54%\" class=\"blocktext\">".Length, NextPart13.IndexOf("</td>\r\n                                </tr>\r\n                                ") - (NextPart13.IndexOf("<tr valign=\"top\" bgcolor=\"FFF7F8\"> \r\n                                  <td width=\"42%\" class=\"blocktext\" bgcolor=\"FFF7F8\">&nbsp;Complexion</td>\r\n                                  <td width=\"4%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"54%\" class=\"blocktext\">") + "<tr valign=\"top\" bgcolor=\"FFF7F8\"> \r\n                                  <td width=\"42%\" class=\"blocktext\" bgcolor=\"FFF7F8\">&nbsp;Complexion</td>\r\n                                  <td width=\"4%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"54%\" class=\"blocktext\">".Length));
 
 
                //Next Looks
                string NextPart14 = NextPart13.Substring(NextPart13.IndexOf("<td width=\"42%\" class=\"blocktext\">&nbsp;Looks</td>\r\n                                  <td width=\"4%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"54%\" class=\"blocktext\">"));
                pro.Looks = NextPart14.Substring(NextPart14.IndexOf("<td width=\"42%\" class=\"blocktext\">&nbsp;Looks</td>\r\n                                  <td width=\"4%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"54%\" class=\"blocktext\">") + "<td width=\"42%\" class=\"blocktext\">&nbsp;Looks</td>\r\n                                  <td width=\"4%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"54%\" class=\"blocktext\">".Length, NextPart14.IndexOf("</td>\r\n                                </tr>\r\n                                ") - (NextPart14.IndexOf("<td width=\"42%\" class=\"blocktext\">&nbsp;Looks</td>\r\n                                  <td width=\"4%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"54%\" class=\"blocktext\">") + "<td width=\"42%\" class=\"blocktext\">&nbsp;Looks</td>\r\n                                  <td width=\"4%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"54%\" class=\"blocktext\">".Length));
 
                //Next Disability
                string NextPart15 = NextPart14.Substring(NextPart14.IndexOf("<td width=\"42%\" class=\"blocktext\">&nbsp;Disability</td>\r\n                                  <td width=\"4%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"54%\" class=\"blocktext\">"));
                pro.Disability = NextPart15.Substring(NextPart15.IndexOf("<td width=\"42%\" class=\"blocktext\">&nbsp;Disability</td>\r\n                                  <td width=\"4%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"54%\" class=\"blocktext\">") + "<td width=\"42%\" class=\"blocktext\">&nbsp;Disability</td>\r\n                                  <td width=\"4%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"54%\" class=\"blocktext\">".Length, NextPart15.IndexOf("</td>\r\n                                </tr>\r\n                                ") - (NextPart15.IndexOf("<td width=\"42%\" class=\"blocktext\">&nbsp;Disability</td>\r\n                                  <td width=\"4%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"54%\" class=\"blocktext\">") + "<td width=\"42%\" class=\"blocktext\">&nbsp;Disability</td>\r\n                                  <td width=\"4%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"54%\" class=\"blocktext\">".Length));
 
                //Next Star
                string NextPart16 = NextPart15.Substring(NextPart15.IndexOf("<td width=\"42%\" class=\"blocktext\">&nbsp;Star</td>\r\n                                  <td width=\"4%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"54%\" class=\"blocktext\">"));
                pro.Star = NextPart16.Substring(NextPart16.IndexOf("<td width=\"42%\" class=\"blocktext\">&nbsp;Star</td>\r\n                                  <td width=\"4%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"54%\" class=\"blocktext\">") + "<td width=\"42%\" class=\"blocktext\">&nbsp;Star</td>\r\n                                  <td width=\"4%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"54%\" class=\"blocktext\">".Length, NextPart16.IndexOf("</td>\r\n                                </tr>\r\n                                ") - (NextPart16.IndexOf("<td width=\"42%\" class=\"blocktext\">&nbsp;Star</td>\r\n                                  <td width=\"4%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"54%\" class=\"blocktext\">") + "<td width=\"42%\" class=\"blocktext\">&nbsp;Star</td>\r\n                                  <td width=\"4%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"54%\" class=\"blocktext\">".Length));
 
                //Next Sect
                string NextPart17 = NextPart16.Substring(NextPart16.IndexOf("<td width=\"42%\" class=\"blocktext\">&nbsp;Sect</td>\r\n                                  <td width=\"4%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"54%\" class=\"blocktext\">"));
                pro.Sect = NextPart17.Substring(NextPart17.IndexOf("<td width=\"42%\" class=\"blocktext\">&nbsp;Sect</td>\r\n                                  <td width=\"4%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"54%\" class=\"blocktext\">") + "<td width=\"42%\" class=\"blocktext\">&nbsp;Sect</td>\r\n                                  <td width=\"4%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"54%\" class=\"blocktext\">".Length, NextPart17.IndexOf("</td>\r\n                                </tr>\r\n                                ") - (NextPart17.IndexOf("<td width=\"42%\" class=\"blocktext\">&nbsp;Sect</td>\r\n                                  <td width=\"4%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"54%\" class=\"blocktext\">") + "<td width=\"42%\" class=\"blocktext\">&nbsp;Sect</td>\r\n                                  <td width=\"4%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"54%\" class=\"blocktext\">".Length));
 
                //Next Caste
                string NextPart18 = NextPart17.Substring(NextPart17.IndexOf("<td width=\"42%\" class=\"blocktext\" bgcolor=\"FFF7F8\" height=\"30\">&nbsp;Caste</td>\r\n                                  <td width=\"4%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"54%\" class=\"blocktext\">"));
                pro.Caste = NextPart18.Substring(NextPart18.IndexOf("<td width=\"42%\" class=\"blocktext\" bgcolor=\"FFF7F8\" height=\"30\">&nbsp;Caste</td>\r\n                                  <td width=\"4%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"54%\" class=\"blocktext\">") + "<td width=\"42%\" class=\"blocktext\" bgcolor=\"FFF7F8\" height=\"30\">&nbsp;Caste</td>\r\n                                  <td width=\"4%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"54%\" class=\"blocktext\">".Length, NextPart18.IndexOf("</td>\r\n                                </tr>\r\n                              ") - (NextPart18.IndexOf("<td width=\"42%\" class=\"blocktext\" bgcolor=\"FFF7F8\" height=\"30\">&nbsp;Caste</td>\r\n                                  <td width=\"4%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"54%\" class=\"blocktext\">") + "<td width=\"42%\" class=\"blocktext\" bgcolor=\"FFF7F8\" height=\"30\">&nbsp;Caste</td>\r\n                                  <td width=\"4%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"54%\" class=\"blocktext\">".Length));
 
 
                //Next DependentMembers
                string NextPart19 = NextPart18.Substring(NextPart18.IndexOf("<td width=\"51%\" class=\"blocktext\">Dependent \r\n                                    Members </td>\r\n                                  <td width=\"5%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"44%\" class=\"blocktext\">"));
                pro.DependentMembers = Convert.ToInt32(NextPart19.Substring(NextPart19.IndexOf("<td width=\"51%\" class=\"blocktext\">Dependent \r\n                                    Members </td>\r\n                                  <td width=\"5%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"44%\" class=\"blocktext\">") + "<td width=\"51%\" class=\"blocktext\">Dependent \r\n                                    Members </td>\r\n                                  <td width=\"5%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"44%\" class=\"blocktext\">".Length, NextPart19.IndexOf("</td>\r\n                                </tr>\r\n                                ") - (NextPart19.IndexOf("<td width=\"51%\" class=\"blocktext\">Dependent \r\n                                    Members </td>\r\n                                  <td width=\"5%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"44%\" class=\"blocktext\">") + "<td width=\"51%\" class=\"blocktext\">Dependent \r\n                                    Members </td>\r\n                                  <td width=\"5%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"44%\" class=\"blocktext\">".Length)));
 
 
                //Next PreferredSetup
                string NextPart20 = NextPart19.Substring(NextPart19.IndexOf("<td width=\"51%\" class=\"blocktext\" bgcolor=\"FFF7F8\">Preferred \r\n                                    Setup </td>\r\n                                  <td width=\"5%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"44%\" class=\"blocktext\">"));
                pro.PreferredSetup = NextPart20.Substring(NextPart20.IndexOf("<td width=\"51%\" class=\"blocktext\" bgcolor=\"FFF7F8\">Preferred \r\n                                    Setup </td>\r\n                                  <td width=\"5%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"44%\" class=\"blocktext\">") + "<td width=\"51%\" class=\"blocktext\" bgcolor=\"FFF7F8\">Preferred \r\n                                    Setup </td>\r\n                                  <td width=\"5%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"44%\" class=\"blocktext\">".Length, NextPart20.IndexOf("</td>\r\n                                </tr>\r\n                                ") - (NextPart20.IndexOf("<td width=\"51%\" class=\"blocktext\" bgcolor=\"FFF7F8\">Preferred \r\n                                    Setup </td>\r\n                                  <td width=\"5%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"44%\" class=\"blocktext\">") + "<td width=\"51%\" class=\"blocktext\" bgcolor=\"FFF7F8\">Preferred \r\n                                    Setup </td>\r\n                                  <td width=\"5%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"44%\" class=\"blocktext\">".Length));
 
 
                //Next ReligiousLevel
                string NextPart21 = NextPart20.Substring(NextPart20.IndexOf("<td width=\"51%\" class=\"blocktext\">Religious \r\n                                    Level </td>\r\n                                  <td width=\"5%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"44%\" class=\"blocktext\">"));
                pro.ReligiousLevel = NextPart21.Substring(NextPart21.IndexOf("<td width=\"51%\" class=\"blocktext\">Religious \r\n                                    Level </td>\r\n                                  <td width=\"5%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"44%\" class=\"blocktext\">") + "<td width=\"51%\" class=\"blocktext\">Religious \r\n                                    Level </td>\r\n                                  <td width=\"5%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"44%\" class=\"blocktext\">".Length, NextPart21.IndexOf("</td>\r\n                                </tr>\r\n                                ") - (NextPart21.IndexOf("<td width=\"51%\" class=\"blocktext\">Religious \r\n                                    Level </td>\r\n                                  <td width=\"5%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"44%\" class=\"blocktext\">") + "<td width=\"51%\" class=\"blocktext\">Religious \r\n                                    Level </td>\r\n                                  <td width=\"5%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"44%\" class=\"blocktext\">".Length));
 
 
                //Next FirstChoiceofFood
                string NextPart22 = NextPart21.Substring(NextPart21.IndexOf("<td width=\"51%\" class=\"blocktext\">First choice \r\n                                    of Food</td>\r\n                                  <td width=\"5%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"44%\" class=\"blocktext\">"));
                pro.FirstChoiceofFood = NextPart22.Substring(NextPart22.IndexOf("<td width=\"51%\" class=\"blocktext\">First choice \r\n                                    of Food</td>\r\n                                  <td width=\"5%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"44%\" class=\"blocktext\">") + "<td width=\"51%\" class=\"blocktext\">First choice \r\n                                    of Food</td>\r\n                                  <td width=\"5%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"44%\" class=\"blocktext\">".Length, NextPart22.IndexOf("</td>\r\n                                </tr>\r\n                                ") - (NextPart22.IndexOf("<td width=\"51%\" class=\"blocktext\">First choice \r\n                                    of Food</td>\r\n                                  <td width=\"5%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"44%\" class=\"blocktext\">") + "<td width=\"51%\" class=\"blocktext\">First choice \r\n                                    of Food</td>\r\n                                  <td width=\"5%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"44%\" class=\"blocktext\">".Length));
 
 
                //Next TVProgram
                string NextPart23 = NextPart22.Substring(NextPart22.IndexOf("<td width=\"51%\" class=\"blocktext\" bgcolor=\"FFF7F8\">TV \r\n                                    program </td>\r\n                                  <td width=\"5%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"44%\" class=\"blocktext\">"));
                pro.TVProgram = NextPart23.Substring(NextPart23.IndexOf("<td width=\"51%\" class=\"blocktext\" bgcolor=\"FFF7F8\">TV \r\n                                    program </td>\r\n                                  <td width=\"5%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"44%\" class=\"blocktext\">") + "<td width=\"51%\" class=\"blocktext\" bgcolor=\"FFF7F8\">TV \r\n                                    program </td>\r\n                                  <td width=\"5%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"44%\" class=\"blocktext\">".Length, NextPart23.IndexOf("</td>\r\n                                </tr>\r\n                                ") - (NextPart23.IndexOf("<td width=\"51%\" class=\"blocktext\" bgcolor=\"FFF7F8\">TV \r\n                                    program </td>\r\n                                  <td width=\"5%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"44%\" class=\"blocktext\">") + "<td width=\"51%\" class=\"blocktext\" bgcolor=\"FFF7F8\">TV \r\n                                    program </td>\r\n                                  <td width=\"5%\" class=\"blocktext\"> \r\n                                    <div align=\"left\">:</div>\r\n                                  </td>\r\n                                  <td width=\"44%\" class=\"blocktext\">".Length));
 
                //Next PreferredHoneymoonPlace
                string NextPart24 = NextPart23.Substring(NextPart23.IndexOf("<td width=\"51%\" class=\"blocktext\">Preferred \r\n                                    Honeymoon Place </td>\r\n                                  <td width=\"5%\" class=\"blocktext\">:</td>\r\n                                  <td width=\"44%\" class=\"blocktext\">"));
                pro.PreferredHoneymoonPlace = NextPart24.Substring(NextPart24.IndexOf("<td width=\"51%\" class=\"blocktext\">Preferred \r\n                                    Honeymoon Place </td>\r\n                                  <td width=\"5%\" class=\"blocktext\">:</td>\r\n                                  <td width=\"44%\" class=\"blocktext\">") + "<td width=\"51%\" class=\"blocktext\">Preferred \r\n                                    Honeymoon Place </td>\r\n                                  <td width=\"5%\" class=\"blocktext\">:</td>\r\n                                  <td width=\"44%\" class=\"blocktext\">".Length, NextPart24.IndexOf("</td>\r\n                                </tr>\r\n                              ") - (NextPart24.IndexOf("<td width=\"51%\" class=\"blocktext\">Preferred \r\n                                    Honeymoon Place </td>\r\n                                  <td width=\"5%\" class=\"blocktext\">:</td>\r\n                                  <td width=\"44%\" class=\"blocktext\">") + "<td width=\"51%\" class=\"blocktext\">Preferred \r\n                                    Honeymoon Place </td>\r\n                                  <td width=\"5%\" class=\"blocktext\">:</td>\r\n                                  <td width=\"44%\" class=\"blocktext\">".Length));
 
                ///Career
                ///Education
                string NextPart25 = NextPart24.Substring(NextPart24.IndexOf("<td width=\"51%\" class=\"blocktext\">Education</td>\r\n                                  <td width=\"5%\" class=\"blocktext\">:</td>\r\n                                  <td width=\"44%\" class=\"blocktext\">"));
                pro.Education = NextPart25.Substring(NextPart25.IndexOf("<td width=\"51%\" class=\"blocktext\">Education</td>\r\n                                  <td width=\"5%\" class=\"blocktext\">:</td>\r\n                                  <td width=\"44%\" class=\"blocktext\">") + "<td width=\"51%\" class=\"blocktext\">Education</td>\r\n                                  <td width=\"5%\" class=\"blocktext\">:</td>\r\n                                  <td width=\"44%\" class=\"blocktext\">".Length, NextPart25.IndexOf("</td>\r\n                                </tr>\r\n                                ") - (NextPart25.IndexOf("<td width=\"51%\" class=\"blocktext\">Education</td>\r\n                                  <td width=\"5%\" class=\"blocktext\">:</td>\r\n                                  <td width=\"44%\" class=\"blocktext\">") + "<td width=\"51%\" class=\"blocktext\">Education</td>\r\n                                  <td width=\"5%\" class=\"blocktext\">:</td>\r\n                                  <td width=\"44%\" class=\"blocktext\">".Length));
                pro.Education = pro.Education.Trim();
 
                ///Occupation
                string NextPart26 = NextPart25.Substring(NextPart25.IndexOf("<td width=\"51%\" class=\"blocktext\">Occupation</td>\r\n                                  <td width=\"5%\" class=\"blocktext\">:</td>\r\n                                  <td width=\"44%\" class=\"blocktext\">"));
                pro.Occupation = NextPart26.Substring(NextPart26.IndexOf("<td width=\"51%\" class=\"blocktext\">Occupation</td>\r\n                                  <td width=\"5%\" class=\"blocktext\">:</td>\r\n                                  <td width=\"44%\" class=\"blocktext\">") + "<td width=\"51%\" class=\"blocktext\">Occupation</td>\r\n                                  <td width=\"5%\" class=\"blocktext\">:</td>\r\n                                  <td width=\"44%\" class=\"blocktext\">".Length, NextPart26.IndexOf("</td>\r\n                                </tr>\r\n                                ") - (NextPart26.IndexOf("<td width=\"51%\" class=\"blocktext\">Occupation</td>\r\n                                  <td width=\"5%\" class=\"blocktext\">:</td>\r\n                                  <td width=\"44%\" class=\"blocktext\">") + "<td width=\"51%\" class=\"blocktext\">Occupation</td>\r\n                                  <td width=\"5%\" class=\"blocktext\">:</td>\r\n                                  <td width=\"44%\" class=\"blocktext\">".Length));
                pro.Occupation = pro.Occupation.Trim();
 
                ///MonthlyIncome
                string NextPart27 = NextPart26.Substring(NextPart26.IndexOf("<td width=\"51%\" class=\"blocktext\">Monthly Income</td>\r\n                                  <td width=\"5%\" class=\"blocktext\">:</td>\r\n                                  <td width=\"44%\" class=\"blocktext\">"));
                pro.MonthlyIncome = NextPart27.Substring(NextPart27.IndexOf("<td width=\"51%\" class=\"blocktext\">Monthly Income</td>\r\n                                  <td width=\"5%\" class=\"blocktext\">:</td>\r\n                                  <td width=\"44%\" class=\"blocktext\">") + "<td width=\"51%\" class=\"blocktext\">Monthly Income</td>\r\n                                  <td width=\"5%\" class=\"blocktext\">:</td>\r\n                                  <td width=\"44%\" class=\"blocktext\">".Length, NextPart27.IndexOf("</td>\r\n                                </tr>\r\n                                ") - (NextPart27.IndexOf("<td width=\"51%\" class=\"blocktext\">Monthly Income</td>\r\n                                  <td width=\"5%\" class=\"blocktext\">:</td>\r\n                                  <td width=\"44%\" class=\"blocktext\">") + "<td width=\"51%\" class=\"blocktext\">Monthly Income</td>\r\n                                  <td width=\"5%\" class=\"blocktext\">:</td>\r\n                                  <td width=\"44%\" class=\"blocktext\">".Length));
                pro.MonthlyIncome = pro.MonthlyIncome.Trim();
 
                ///EducationDetails
                string NextPart28 = NextPart27.Substring(NextPart27.IndexOf("<tr bgcolor=\"FFF7F8\"> \r\n                                  <td width=\"51%\" class=\"blocktext\">Education \r\n                                    Details</td>\r\n                                  <td width=\"5%\" class=\"blocktext\">:</td>\r\n                                  <td width=\"44%\" class=\"blocktext\">"));
                pro.EducationDetails = NextPart28.Substring(NextPart28.IndexOf("<tr bgcolor=\"FFF7F8\"> \r\n                                  <td width=\"51%\" class=\"blocktext\">Education \r\n                                    Details</td>\r\n                                  <td width=\"5%\" class=\"blocktext\">:</td>\r\n                                  <td width=\"44%\" class=\"blocktext\">") + "<tr bgcolor=\"FFF7F8\"> \r\n                                  <td width=\"51%\" class=\"blocktext\">Education \r\n                                    Details</td>\r\n                                  <td width=\"5%\" class=\"blocktext\">:</td>\r\n                                  <td width=\"44%\" class=\"blocktext\">".Length, NextPart28.IndexOf("</td>\r\n                                </tr>\r\n                              ") - (NextPart28.IndexOf("<tr bgcolor=\"FFF7F8\"> \r\n                                  <td width=\"51%\" class=\"blocktext\">Education \r\n                                    Details</td>\r\n                                  <td width=\"5%\" class=\"blocktext\">:</td>\r\n                                  <td width=\"44%\" class=\"blocktext\">") + "<tr bgcolor=\"FFF7F8\"> \r\n                                  <td width=\"51%\" class=\"blocktext\">Education \r\n                                    Details</td>\r\n                                  <td width=\"5%\" class=\"blocktext\">:</td>\r\n                                  <td width=\"44%\" class=\"blocktext\">".Length));
                pro.EducationDetails = pro.EducationDetails.Trim();
 
                ///MoreAboutMe
                string NextPart29 = NextPart28.Substring(NextPart28.IndexOf("<tr valign=\"top\" bgcolor=\"FFF7F8\"> \r\n                                  <td width=\"22%\" class=\"blocktext\">More about \r\n                                    me</td>\r\n                                  <td width=\"2%\" class=\"blocktext\">:</td>\r\n                                  <td width=\"76%\" class=\"blocktext\">"));
                pro.MoreAboutMe = NextPart29.Substring(NextPart29.IndexOf("<tr valign=\"top\" bgcolor=\"FFF7F8\"> \r\n                                  <td width=\"22%\" class=\"blocktext\">More about \r\n                                    me</td>\r\n                                  <td width=\"2%\" class=\"blocktext\">:</td>\r\n                                  <td width=\"76%\" class=\"blocktext\">") + "<tr valign=\"top\" bgcolor=\"FFF7F8\"> \r\n                                  <td width=\"22%\" class=\"blocktext\">More about \r\n                                    me</td>\r\n                                  <td width=\"2%\" class=\"blocktext\">:</td>\r\n                                  <td width=\"76%\" class=\"blocktext\">".Length, NextPart29.IndexOf("</td>\r\n                                </tr>\r\n                                ") - (NextPart29.IndexOf("<tr valign=\"top\" bgcolor=\"FFF7F8\"> \r\n                                  <td width=\"22%\" class=\"blocktext\">More about \r\n                                    me</td>\r\n                                  <td width=\"2%\" class=\"blocktext\">:</td>\r\n                                  <td width=\"76%\" class=\"blocktext\">") + "<tr valign=\"top\" bgcolor=\"FFF7F8\"> \r\n                                  <td width=\"22%\" class=\"blocktext\">More about \r\n                                    me</td>\r\n                                  <td width=\"2%\" class=\"blocktext\">:</td>\r\n                                  <td width=\"76%\" class=\"blocktext\">".Length));
 
                ///Occupation
                string NextPart30 = NextPart29.Substring(NextPart29.IndexOf("<td width=\"22%\" class=\"blocktext\">My habits</td>\r\n                                  <td width=\"2%\" class=\"blocktext\">:</td>\r\n                                  <td width=\"76%\" class=\"blocktext\">"));
                pro.MyHabits = NextPart30.Substring(NextPart30.IndexOf("<td width=\"22%\" class=\"blocktext\">My habits</td>\r\n                                  <td width=\"2%\" class=\"blocktext\">:</td>\r\n                                  <td width=\"76%\" class=\"blocktext\">") + "<td width=\"22%\" class=\"blocktext\">My habits</td>\r\n                                  <td width=\"2%\" class=\"blocktext\">:</td>\r\n                                  <td width=\"76%\" class=\"blocktext\">".Length, NextPart30.IndexOf("</td>\r\n                                </tr>\r\n                                ") - (NextPart30.IndexOf("<td width=\"22%\" class=\"blocktext\">My habits</td>\r\n                                  <td width=\"2%\" class=\"blocktext\">:</td>\r\n                                  <td width=\"76%\" class=\"blocktext\">") + "<td width=\"22%\" class=\"blocktext\">My habits</td>\r\n                                  <td width=\"2%\" class=\"blocktext\">:</td>\r\n                                  <td width=\"76%\" class=\"blocktext\">".Length));
 
                ///MyBadHabits
                string NextPart31 = NextPart30.Substring(NextPart30.IndexOf("<td width=\"22%\" class=\"blocktext\" height=\"2\">My \r\n                                    bad habits</td>\r\n                                  <td width=\"2%\" class=\"blocktext\" height=\"2\">:</td>\r\n                                  <td width=\"76%\" class=\"blocktext\" height=\"2\">"));
                pro.MyBadHabits = NextPart31.Substring(NextPart31.IndexOf("<td width=\"22%\" class=\"blocktext\" height=\"2\">My \r\n                                    bad habits</td>\r\n                                  <td width=\"2%\" class=\"blocktext\" height=\"2\">:</td>\r\n                                  <td width=\"76%\" class=\"blocktext\" height=\"2\">") + "<td width=\"22%\" class=\"blocktext\" height=\"2\">My \r\n                                    bad habits</td>\r\n                                  <td width=\"2%\" class=\"blocktext\" height=\"2\">:</td>\r\n                                  <td width=\"76%\" class=\"blocktext\" height=\"2\">".Length, NextPart31.IndexOf("</td>\r\n                                </tr>\r\n                              ") - (NextPart31.IndexOf("<td width=\"22%\" class=\"blocktext\" height=\"2\">My \r\n                                    bad habits</td>\r\n                                  <td width=\"2%\" class=\"blocktext\" height=\"2\">:</td>\r\n                                  <td width=\"76%\" class=\"blocktext\" height=\"2\">") + "<td width=\"22%\" class=\"blocktext\" height=\"2\">My \r\n                                    bad habits</td>\r\n                                  <td width=\"2%\" class=\"blocktext\" height=\"2\">:</td>\r\n                                  <td width=\"76%\" class=\"blocktext\" height=\"2\">".Length));
 
 
                dbx.Profiles.InsertOnSubmit(pro);
                dbx.SubmitChanges();
          }
            catch (Exception e)
            { }
        }

Above code has 2 functions, one function named as DownloadWebPage. It takes url of a web page as input and return the content of page as string.
Second method named as ProcessWebPage which takes input: content of profile web page as string and the profile id; used by Geo team as primary identifier for profile. This function split the content of profile page and insert in the table created above via linq.

Finally following code, loop through the profiles and extract data using abovr 2 methods.

  for (int i = 288781; i < 300000; i++)
            {
   ProcessWebPage(DownloadWebPage("http://shaadionline.tv/details.asp?id="+i),i);
            }

So we are done. A small utility for extracting data from http://www.shaadionline.tv web site is ready.
Basic benefit of this activity is that we have all the profile in local database and we can search in all fields with all possible combinations.
If you need working code contact me.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: