What is the best way to parse large amounts of formatted text datainto a table so that it can be retrieved with as much formattingretained as possible - particularly paragraphs? Will eachparagraph need to be inserted into its own row to be retrieved as aparagraph?
Wrap each paragraph in an HTML <p> paragraph goes here </p> section. They can sit together just like this
<p>1</p><p>2</p><p>3</p>
|||
Thanks, Bryan.
To follow up, I am having trouble retrieving the text data in aparagraph format. For example, I input<p>1</p><p>2</p><p>3</p>into a text field in the database and dropped the table onto a page inVisual Web Developer. The data format looked exactly like it wasinput into the db (i.e.,<p>1</p><p>2</p><p>3</p>) with theparagraph tags showing but no actual paragraphs. Is there a trickto getting the paragraph formatting to show up properly in the controlon my web page?
It depends on what kind of control you are trying to show the value in. Some controls render output to the page as HTML. You have to understand what really happens when an ASPX page is rendered. Browsers dont know what asp.net is. They dont care, they never see it. This is why you need an asp.net server and IIS. The engine on the server downrenders your .net code to HTML in a way the browser can understand. If you view the source of one of your pages, you'll notice all you have is a bunch of HTML and javascript.
The example below points this out by using a label (which just throws values into the HTML stream). And a textbox (which renders literal output)
<asp:LabelID="label1"runat="server"></asp:Label>
<asp:TextBoxID="textbox1"runat="server"/>
ProtectedSub Page_Load(ByVal senderAsObject,ByVal eAs System.EventArgs)HandlesMe.Loadlabel1.Text ="<p>1</p><p>2</p><p>3</p>"
textbox1.Text ="<p>1</p><p>2</p><p>3</p>"
EndSub
|||Bryan - thanks for the input. But I am a little baffled sinceall three controls on my page are ASP (and not simply html) and all arerunat="server" (which I understand to mean that the server converts theasp to html for the browser). But only the ASP label has theparagraph formatting. Since I don't expect to be using the labelcontrol as a primary means of displaying the extensive text data fromthe db, is it simply a matter of setting control properties on othercontrols (like the gridview dataformatstring property) to get them tohandle the <p> tags? Or am I simply limited in my choice ofcontrols that can do this job?
Not all controls will render ...for lack of a better term...renderable HTML from their data / text property. Inside gridviews for example, you'll probably want to use Bound labels, or literal controls. I think you may be missing something conceptually here. Explain to me what type of gridview you are rendering, maybe I can shed some light on the subject - and give you an example of how you'd do what you're trying to do.
|||The project I am considering involves providing searchcapability on numerous, various documents (mostly magazine articles andresearch papers) each of which is several pages long with faily complexformatting (if I retain the original look). The product needs to beavailable online and on CDs. The actual search results willprobably contain document titles or abstracts that can be selected sothat the full article will be displayed - probablly a gridview orsimilar control for the brief results list with some type of detailsview counterpart for the selected text content display. My choiceof controls may need to include Windows forms controls as well as webpage controls.
At this early stage I am wondering if Ishould put the formatted documents into the database as varbinary datathat can be searched and retrieved maintaining the formatting. Or, just put the text data into the db with simple paragraph formattingthat will give a uniform look to the search output. I have done abit ofdatabase work involving simple data searching, etc., but never withextensive, complex text fields such as this where the formatting in theoutput is important.
Thanks for the insights so far and for any additional help you can provide.
|||
Ok, I think you've done a pretty good job thinking through this so far, but I'll offer up a little advice to you. First of all, probably the best / most accepted web format for complex documents is PDF. There are numerous .NET utilities to encode input into a PDF, including converters that could take word documents, excel spreadsheets, star office...just about anything you can think of - and turn it into a PDF for your purposes. Now, my advice would be to NOT put all this information in the database. To truly allow a full text search - you are going to cause yourself a huge amount of work, and headache.
Store a path to the PDF in your database, so you know where it's at, and render a title, maybe document iinformation like creation time, last modified, owner, short description...etc.
Now the magic comes in. Tie into the MS Indexing service. It can allow a really slick full text search, and there is a connector that allows the indexing service to search inside pdfs. So all the text inside the PDF itself will be indexed, and included in the search criteria. What's returned from a call to the indexing service is a location to the document (your aspx form). My advice is to break it out like this
Site > Document List > Document Information Page with a link to the document (derived from the path in your database)
When the MS indexing service returns, it will return the location of your Document Information Page (as a link) or a list of them depending on how many results were returned. Clicking it will take you directly to the pertitant information.
Here is a great article on accomplishing this.
http://aspnet.4guysfromrolla.com/articles/033005-1.aspx
|||Thanks again Bryan. I have been looking at that article and will try that approach.
Followupquestion on the previous discussion on controls. Are you aware ofany web controls that render html besides the label? I wouldthink the textbox would have a property that would support htmlrendering but can't find any.
|||A textbox will not. You'll need an HTML based textbox that can support it. Much like the one you use to post comments on this site. There are several out there. the FCK Editor is the one I usually use, but there's also FreeTextBox, which is the other big free one.
An ASP:Literal control gives you options to render HTML formated output, transform it, or passthrough. That may be a great display control for you, but .NET has no controls that are directly editable, that support HTML nativly.
|||Bryan. Thanks for all your help in sorting this out. Good to have some options as I move forward with this project.
No comments:
Post a Comment