Click here to Skip to main content
15,890,282 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
The overall goal is to get this HTML outline complete with formatting (blue italics, etc.) into a PowerPoint slide.

The HTML is dynamic, coming from a DB field which is outputted via looping through records, so I never know how many bullets or where the formatting (spans) for blue italics may be.

So this is the HTML:
HTML
<style id="oboutEditorDefaultStyle">  .blueItalic    { color: #0000ff; font-style:italic;}        body       {              color:#404040;background-color: #fff;              border-width: 0px;margin-top: 0px; margin-bottom: 0px;              margin-left: 0px; margin-right: 0px;              padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px;             }  body,table td             {              font-family: verdana,sans-serif;font-size: 10pt;             }  h1         {              font-size: 24pt;             }  h2         {              font-size: 18pt;             }  h3         {              font-size: 14pt;             }  h4         {              font-size: 12pt;             }  h5         {              font-size: 10pt;             }  h6         {              font-size:  8pt;             }            </style>



<p align='left'>
   <ul> 
	<li id="ShortfallsMain">(G) Shortfalls: abcd eoritu <span 		class="blueItalic">this is blue italics </span>ttt 
	</li>
		<ul>
			<li id="ShortfallsSub1">(G) dflgk <span 				class="blueItalic">dflgk</span>; dflgkdfgk dfgkl dfkl;dd 				ddd 
			</li>
			
			<li id="ShortfallsSub2">(H) dkflgj dfklj <span 				class="blueItalic">retio </span>ert 
			</li>
		</ul>
	<li id="ImpactMain">(F) Impact: dfjkgh dkfjgh dfjkh dfjgh jdfgh 		dfjgh jhfdgjh er dfjgh djfgh dfjgh dfjgh dfjgh dfjh dfjh dfjh <span 		class="blueItalic">dfjghdfgjhdf </span>djfkh dfjh dfjkh dfgjk 
	</li>
		<ul></ul>
	<li id="ConnectMain">(G) Connections: hfgh fghfghfg fghfgh fghfg 
	</li>
		<ul></ul>
	<li id="ResolutionMain">(G) Resolution: 
	</li>
		<ul>
			<li id="ResolutionSub1">(G) eritu eriou <span 				class="blueItalic">eriou 									</span>ert 
			</li>
			<li id="ResolutionSub2">(F) dfjkh dfgjkh <span 				class="blueItalic">dfgkjdhf 									</span>gkjdg 
			</li>
			<li id="ResolutionSub3">(F) dflkgj dflgkj dflgjk <span 				class="blueItalic">dflkgj 									</span>dfg 
			</li>
			<li id="ResolutionSub4">(H) xcvb xcbv xcmvb <span 							class="blueItalic">xopiopiopicvb </span>xcv 
			</li>
		</ul> 
   </ul>
</p>
<br> 
<p align='center'> (F) BD: Feb 15     (F) WD: Jan 16</p>


I am using OpenXML and have gathered this little code snippet from my research:
C#
CharacterBullet charBull = new DocumentFormat.OpenXml.Drawing.CharacterBullet(){Char = "."};
BulletFont fontBull = new DocumentFormat.OpenXml.Drawing.BulletFont() { Typeface = "Calibri", PitchFamily=34, CharacterSet=0 };
Paragraph prgrph = new Paragraph();
ParagraphProperties prgrphProp = new ParagraphProperties();

prgrphProp.Append(charBull);
prgrphProp.Append(fontBull);

Run run = new Run();
RunProperties runProp = new RunProperties() {Language="en-US", SmartTagClean=false };
DocumentFormat.OpenXml.Drawing.Text txt = new DocumentFormat.OpenXml.Drawing.Text();
txt.Text = "";


I am new to OpenXML and XML structure in general, so my question is how do I parse through the HTML (outline, blue italics, etc..) in order to convert to XML via OpenXML?

Thank you.
Posted
Comments
Sergey Alexandrovich Kryukov 10-Apr-15 15:37pm    
Not quite clear. Well-written HTML is already XML (XHTML). If you mean "Open XML", you should have mention it in the title, too. (It's good that you add it as a tag.)
The problem is: the "conversion" if HTML into PowerPoint presentation is ambiguous. You need to define the mapping, specify how the content is broken into those frames and how the fluid HTML content is rendered to rigid format of the PowerPoint frames.
—SA
Sergey Alexandrovich Kryukov 10-Apr-15 15:39pm    
Is it C# or what?
—SA
pats2Kdynasty 10-Apr-15 16:21pm    
I do have OpenXML in my title but I did forget to mention that I am using C#. I am not a fan of PowerPoint but some branches of the Gov't are a bit behind the times and have a tendency to tie a developers hands. I am going to accept yours as the solution because it seems like a very all-encompassing answer, but to be honest it will take me some time to research through all you have said.
Sergey Alexandrovich Kryukov 10-Apr-15 17:40pm    
Well, there is no such thing as miracle. All ways need time, but I think my second approach is more fruitful. Which way would be faster, depends on at least 3 factors: 1) the content you currently have and/or have to present, 2) the particular requirements to presentations, 3) your experience. In second approach, you can save a lot on not dealing with Open XML and PowerPoint-specific stuff, which is a lot... At the same time, with HTML + JavaScript, there are a lot of available libraries, expert's answers and other resources; and a lot more people (including myself) who could help you if you face problems with some particular requirements or effects you want to achieve but get stuck...
—SA

1 solution

First of all, please see my comment to the question.

The direct answer is: if you want to parse HTML, the best way to do it would be applied if you have HTML well-formed as XHTML, that is, XML. Then you can use one of the many available parsers. This is my short overview of what you could use with .NET FCL:
  1. Use System.Xml.XmlDocument class. It implements DOM interface; this way is the easiest and good enough if the size if the document is not too big.
    See http://msdn.microsoft.com/en-us/library/system.xml.xmldocument.aspx.
  2. Use the class System.Xml.XmlTextReader; this is the fastest way of reading, especially is you need to skip some data.
    See http://msdn.microsoft.com/en-us/library/system.xml.xmlreader.aspx.
  3. Use the class System.Xml.Linq.XDocument; this is the most adequate way similar to that of XmlDocument, supporting LINQ to XML Programming.
    See http://msdn.microsoft.com/en-us/library/system.xml.xmldocument.aspx, http://msdn.microsoft.com/en-us/library/bb387063.aspx.


If you don't have so well-formed HTML, you would need to use some other parser. Your could try this one, just for one example: http://www.majestic12.co.uk/projects/html_parser.php.


But I have a more interesting suggestion for you.

You see, I once realized that PowerPoint is a pretty bad thing. And one nice day I saw an article explaining why people using PowerPoint typically produce extremely non-informative presentation. The author of the article (sorry, I cannot file a link at this time) even expressed the opinion that this product was designed to please some specific kind of presenters, creator of business presentations who have really nothing to tell people, but who wants just to create an impression of some serious content, some vision, and so on, showing nothing essential at all. It confirmed my own impression, so I avoided to use PowerPoint completely in my own presentations.

So, here is my alternative idea: you can create HTML (+JavaScript) presentation based on your available HTML content. You can create several frames as different HTML pages; you only need to add some controls like <-,> and menu, or alternatively, you can use just the browser navigation, to save some screen space. You can even make it all using one HTML page, changing content (say, just the CSS property visibility) using JavaScript.

Moreover, for flexibility and perfect graphics quality, you can completely discard all pixel graphics (bitmaps) and switch to vector graphics. Or you can combine vector graphics with bitmaps. What do you have for vector graphics? Or, more than enough, but both options are related to HTML5.

First, you can use SVG embedded in HTML, and you can have the images scaled automatically according to your actual inner windows size. Please see:
http://en.wikipedia.org/wiki/Scalable_Vector_Graphics.

Moreover, you can use SVG animation: https://css-tricks.com/guide-svg-animations-smil.

Another technology offering you both vector graphics and animation is HTML5 Canvas: http://en.wikipedia.org/wiki/Canvas_element.

With Canvas, you can achieve both rendering and animation using JavaScript. It can give you full freedom in the design of your application and its interactive control; those stupid and ugly PowerPoint cannot match 0.001% of Canvas capabilities (which are also really easy to use). For example, one can illustrate some real production processes, physical phenomena, business processes and a lot more.

This way, you could reuse your existing HTML code and create much better presentations.

[EDIT]

In addition to all of the above: you can run your HTML+JavaScript presentation on much wider range of systems than PowerPoint. You can even use a smartphone connected to a projector (keeping that smartphone in your hands during all presentation). You can even use a stupid smart TV with memory card and embedded browser, without any "real" computer…

—SA
 
Share this answer
 
v6

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900