Photo by Glenn Carstens-Peters on Unsplash
There are multiple ways to get a PDF version of a file, so I figured I’d show how you via a path to a file in SharePoint can use the Microsoft Graph API to get a PDF version of that file. I’ll be using the Graph drive item conversion API for this.
A sample URL could look something like this: https://contoso.sharepoint.com/sites/asite/FooLib/lala/Document.docx
[Update]
After posting the question on Stack Overflow I received an answer from Vadim Gremyachev which takes it down to one API call.
Basically he clued me onto how you can create a sharing token for the item URL which is actually the file id. Code for this is listed in the Graph Sharing API docs.
First you base64 encode the URL, replace some characters and prefix with u!, then access the files via the /sharing API. The below code is using PowerShell to construct the token.
$url = 'https://contoso.sharepoint.com/sites/asite/FooLib/lala/Document.docx' "u!"+[Convert]::ToBase64String([Text.Encoding]::UTF8.GetBytes($url)).TrimEnd('=').Replace('/','_').Replace('+','-') u!aHR0cHM6Ly9jb250b3NvLnNoYXJlcG9pbnQuY29tL3NpdGVzL2FzaXRlL0Zvb0xpYi9sYWxhL0RvY3VtZW50LmRvY3g
Armed with the token the result API call is:
https://graph.microsoft.com/v1.0/shares/u!aHR0cHM6Ly9jb250b3NvLnNoYXJlcG9pbnQuY29tL3NpdGVzL2FzaXRlL0Zvb0xpYi9sYWxhL0RvY3VtZW50LmRvY3g/driveItem/content?format=pdf
[Original post]
In order to get to the actual file two API calls are needed, one to fetch the drive (library) id, and one to fetch the file.
Note: This solution will not work on the root site collection as I make assumptions on the number of parts of a URL. The following file formats are supported: csv, doc, docx, odp, ods, odt, pot, potm, potx, pps, ppsx, ppsxm, ppt, pptm, pptx, rtf, xls, xlsx.
Deconstructing the file URL
Splitting the URL on slashes we get the parts needed to get the id of the document library and the id of the file.
0 https:
1
2 contoso.sharepoint.com
3 sites
4 pub
5 FooLib
6 lala
7 Document.docx
Part 2 is the tenant hostname, part 3+4 is the site path, part 5 is the document library, and part 6 and out is the item path relative to the document library.
Getting the drive id (id of document library)
Using the sample URL above we combine the sites and drives API’s in one query:
/v1.0/sites/{hostname}:{server-relative-path}:/drives
resulting in the following query where we select id and url
https://graph.microsoft.com/v1.0/sites/contos.sharepoint.com:/sites/asite:/drives?$select=id,weburl
The output of this call are all the libraries in the site.
{ "@odata.context": "https://graph.microsoft.com/v1.0/$metadata#drives(id,webUrl)", "value": [ { "id": "b!H11aFSof8062NsPf4rr-qE3OKQpUIjVEp7PzqdeT_psgYyKuXH2VR7fGsvWPyBOt", "webUrl": "https://contoso.sharepoint.com/sites/asite/Documents" }, { "id": "b!H11aFSof8062NsPf4rr-qE3OKQpUIjVEp7PzqdeT_pv8T5clDnpiRZq2uVmXgGRU", "webUrl": "https://contoso.sharepoint.com/sites/asite/FooLib" }, { "id": "b!H11aFSof8062NsPf4rr-qE3OKQpUIjVEp7PzqdeT_psUQF8PSnx9T7aXwvRalLc_", "webUrl": "https://contoso.sharepoint.com/sites/asite/PublishingImages" }, { "id": "b!H11aFSof8062NsPf4rr-qE3OKQpUIjVEp7PzqdeT_pv01hj6qcWyR5wulob7Lk7-", "webUrl": "https://contoso.sharepoint.com/sites/asite/Pages" }, { "id": "b!H11aFSof8062NsPf4rr-qE3OKQpUIjVEp7PzqdeT_pvEaXdch-3DToEk0qR4g-xx", "webUrl": "https://contoso.sharepoint.com/sites/asite/SiteCollectionDocuments" }, { "id": "b!H11aFSof8062NsPf4rr-qE3OKQpUIjVEp7PzqdeT_ptwBh2OaBQOTbJMXT5jLKwi", "webUrl": "https://contoso.sharepoint.com/sites/asite/SiteCollectionImages" }, { "id": "b!H11aFSof8062NsPf4rr-qE3OKQpUIjVEp7PzqdeT_pv-q5N0D8gWSLB-0MY7_RS3", "webUrl": "https://contoso.sharepoint.com/sites/asite/Translation%20Packages" } ] }
Ideally you would use a $filter query to pick out just the library you want, but this is not supported for the drives endpoint, so you need to post-filter yourself.
By filtering out the item which has a webUrl matching part 2,3 and 4 combined you have the library you are looking for.
Getting the PDF URL for the file
With the id of the document library in hand, it’s time for the next query which will return the URL of the PDF version in a 302 Location header.
/v1.0/drives/{drive-id}/root:/{item-path}:/content?format=pdf
Using the drive id from the previous call together with the document path I end up with the following URL
https://graph.microsoft.com/v1.0/drives/b!H11aFSof8062NsPf4rr-qE3OKQpUIjVEp7PzqdeT_pv8T5clDnpiRZq2uVmXgGRU/root:/FooLib/lala/Document.docx:/content?format=pdf
If you look at the Location header in the returned response you will find something similar to:
https://northeurope1-mediap.svc.ms/transform/pdf?provider=spo&inputFormat=docx&cs=N2FiNzg2….
This is a pre-authenticated URL which can be called directly from anywhere without the need to logging in, and the URL is valid for a few minutes only.