SharePoint Online: Get All Documents Inventory in a Site using PowerShell
Requirement: List All Documents in SharePoint Online Site Collection using PowerShell.
SharePoint Online PowerShell to Get All Documents
Have you ever needed to get a list of all the documents on a SharePoint Online site? In this blog post, we will look at how you can use PowerShell to get all the documents in a SharePoint Online site. This can be useful if you need to gather information about all the documents on a site quickly, or if you have to export all the documents inventory to a CSV file.
Here is the PowerShell to list all documents from document libraries in SharePoint Online site collection and export to CSV:
#Load SharePoint CSOM Assemblies
Add-Type -Path "C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\16\ISAPI\Microsoft.SharePoint.Client.dll"
Add-Type -Path "C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\16\ISAPI\Microsoft.SharePoint.Client.Runtime.dll"
#Function to Generate Report on all documents in a SharePoint Online Site Collection
Function Get-SPODocumentInventory($SiteURL)
{
Try {
#Setup the context
$Ctx = New-Object Microsoft.SharePoint.Client.ClientContext($SiteURL)
$Ctx.Credentials = $Credentials
#Get the web from given URL and its subsites
$Web = $Ctx.web
$Ctx.Load($Web)
$Ctx.Load($Web.Lists)
$Ctx.Load($web.Webs)
$Ctx.executeQuery()
#Arry to Skip System Lists and Libraries
$SystemLists =@("Converted Forms", "Master Page Gallery", "Customized Reports", "Form Templates", "List Template Gallery", "Theme Gallery",
"Reporting Templates", "Solution Gallery", "Style Library", "Web Part Gallery","Site Assets", "wfpub", "Site Pages", "Images")
Write-host -f Yellow "Processing Site: $SiteURL"
#Filter Document Libraries to Scan
$Lists = $Web.Lists | Where {$_.BaseType -eq "DocumentLibrary" -and $_.Hidden -eq $false -and $SystemLists -notcontains $_.Title -and $_.ItemCount -gt 0}
#Loop through each document library
Foreach ($List in $Lists)
{
#Define CAML Query to Get List Items in batches
$Query = New-Object Microsoft.SharePoint.Client.CamlQuery
$Query.ViewXml ="
<View Scope='RecursiveAll'>
<Query>
<OrderBy><FieldRef Name='ID' Ascending='TRUE'/></OrderBy>
</Query>
<RowLimit Paged='TRUE'>$BatchSize</RowLimit>
</View>"
Write-host -f Cyan "`t Processing Document Library: '$($List.Title)' with $($List.ItemCount) Item(s)"
Do {
#Get List items
$ListItems = $List.GetItems($Query)
$Ctx.Load($ListItems)
$Ctx.ExecuteQuery()
#Filter Files
$Files = $ListItems | Where { $_.FileSystemObjectType -eq "File"}
#Iterate through each file and get data
$DocumentInventory = @()
Foreach($Item in $Files)
{
$File = $Item.File
$Ctx.Load($File)
$Ctx.ExecuteQuery()
$DocumentData = New-Object PSObject
$DocumentData | Add-Member NoteProperty SiteURL($SiteURL)
$DocumentData | Add-Member NoteProperty DocLibraryName($List.Title)
$DocumentData | Add-Member NoteProperty FileName($File.Name)
$DocumentData | Add-Member NoteProperty FileURL($File.ServerRelativeUrl)
$DocumentData | Add-Member NoteProperty CreatedBy($Item["Author"].Email)
$DocumentData | Add-Member NoteProperty CreatedOn($File.TimeCreated)
$DocumentData | Add-Member NoteProperty ModifiedBy($Item["Editor"].Email)
$DocumentData | Add-Member NoteProperty LastModifiedOn($File.TimeLastModified)
$DocumentData | Add-Member NoteProperty Size-KB([math]::Round($File.Length/1KB))
#Add the result to an Array
$DocumentInventory += $DocumentData
}
#Export the result to CSV file
$DocumentInventory | Export-CSV $ReportOutput -NoTypeInformation -Append
$Query.ListItemCollectionPosition = $ListItems.ListItemCollectionPosition
} While($Query.ListItemCollectionPosition -ne $null)
}
#Iterate through each subsite of the current web and call the function recursively
ForEach ($Subweb in $Web.Webs)
{
#Call the function recursively to process all subsites underneaththe current web
Get-SPODocumentInventory($Subweb.url)
}
}
Catch {
write-host -f Red "Error Generating Document Inventory!" $_.Exception.Message
}
}
#Config Parameters
$SiteCollURL="https://crescent.sharepoint.com/sites/marketing"
$ReportOutput="C:\temp\DocInventory.csv"
$BatchSize = 500
#Setup Credentials to connect
$Cred= Get-Credential
$Credentials = New-Object Microsoft.SharePoint.Client.SharePointOnlineCredentials($Cred.Username, $Cred.Password)
#Delete the Output Report, if exists
if (Test-Path $ReportOutput) { Remove-Item $ReportOutput }
#Call the function
Get-SPODocumentInventory $SiteCollURL
Change the site collection URL and report output path in the script and run:
This gets a detailed report of all documents on the SharePoint Online site using PowerShell. Below is the result of PowerShell to list all documents:
If you just need to list all documents in a single document library instead of the whole site collection, use: PowerShell to List All documents in a Library in SharePoint Online
PnP PowerShell to Get All Documents Inventory from a SharePoint Online Site
Are you looking for a way to quickly gather all the documents in a given SharePoint Online site without navigating through the site structure manually? Sure, Let me show you how to use PowerShell to get all the documents inventory from a site using PnP PowerShell!
Here is the PnP PowerShell way to get all files from all document libraries of a given site collection to a CSV file:
#Parameters
$SiteURL = "https://crescent.SharePoint.com/sites/Marketing"
$CSVPath = "C:\Temp\DocumentInventory.csv"
$global:DocumentInventory = @()
$Pagesize = 2000
#Function to scan and collect Document Inventory
Function Get-DocumentInventory
{
[cmdletbinding()]
param([parameter(Mandatory = $true, ValueFromPipeline = $true)] $Web)
Write-host "Getting Documents Inventory from Site '$($Web.URL)'" -f Yellow
Connect-PnPOnline -Url $Web.URL -Interactive
#Calculate the URL of the tenant
If($Web.ServerRelativeUrl -eq "/")
{
$TenantURL = $Web.Url
}
Else
{
$TenantURL = $Web.Url.Replace($Web.ServerRelativeUrl,'')
}
#Exclude certain libraries
$ExcludedLists = @("Form Templates", "Preservation Hold Library","Site Assets", "Pages", "Site Pages", "Images",
"Site Collection Documents", "Site Collection Images","Style Library")
#Get All Document Libraries from the Web
Get-PnPList -PipelineVariable List | Where-Object {$_.BaseType -eq "DocumentLibrary" -and $_.Hidden -eq $false -and $_.Title -notin $ExcludedLists -and $_.ItemCount -gt 0} | ForEach-Object {
#Get Items from List
$global:counter = 0;
$ListItems = Get-PnPListItem -List $_ -PageSize $Pagesize -Fields Author, Created, File_x0020_Type,File_x0020_Size -ScriptBlock `
{ Param($items) $global:counter += $items.Count; Write-Progress -PercentComplete ($global:Counter / ($_.ItemCount) * 100) -Activity "Getting Documents from '$($_.Title)'" -Status "Processing Items $global:Counter to $($_.ItemCount)";} | Where {$_.FileSystemObjectType -eq "File"}
Write-Progress -Activity "Completed Retrieving Documents from Library $($List.Title)" -Completed
#Get Root folder of the List
$Folder = Get-PnPProperty -ClientObject $_ -Property RootFolder
#Iterate through each document and collect data
ForEach($ListItem in $ListItems)
{
#Collect document data
$global:DocumentInventory += New-Object PSObject -Property ([ordered]@{
SiteName = $Web.Title
SiteURL = $Web.URL
LibraryName = $List.Title
ParentFolder = $Folder.ServerRelativeURL
FileName = $ListItem.FieldValues.FileLeafRef
FileType = $ListItem.FieldValues.File_x0020_Type
FileSize = [math]::Round($ListItem.FieldValues.File_x0020_Size/1KB)
AbsoluteURL = "$TenantURL$($ListItem.FieldValues.FileRef)"
CreatedBy = $ListItem.FieldValues.Author.Email
CreatedAt = $ListItem.FieldValues.Created
ModifiedBy = $ListItem.FieldValues.Editor.Email
ModifiedAt = $ListItem.FieldValues.Modified
})
}
}
}
#Connect to Site collection
Connect-PnPOnline -Url $SiteURL -Interactive
#Call the Function for Webs
Get-PnPSubWeb -Recurse -IncludeRootWeb | ForEach-Object { Get-DocumentInventory $_ }
#Export Documents Inventory to CSV
$Global:DocumentInventory | Export-Csv $CSVPath -NoTypeInformation
Write-host "Documents Inventory Report has been Exported to '$CSVPath'" -f Green
Here is another post for SharePoint On-Premises to get all documents from the SharePoint document library: Get All Documents Inventory in SharePoint using PowerShell
This is a great script. When I’m running it in PS 7 with pnp.powershell it stops after about 14 sites and the terminal/code stops responding. Works fine in powershell 5 however. Any idea why that is.
I want to get all files that are modified in the past one year. Can you help?
Use this logic:
$DateFilter = (Get-Date).AddDays(-365).Date
If($File.TimeLastModified -gt $DateFilter)
{
#Collect File Details
}
Hi Salaudeen and thank you for your scripts.
I am using your second script on this page (PnP for SPO), which runs fine as is.
I tried adding a file size column to this script unsuccessfully, by inserting a line for FileSize under the FileType line:
FileType = $ListItem.FieldValues.File_x0020_Type
FileSize = $ListItem.FieldValues.File_x0020_Size
Can you please advise how we can include a column showing filesize with this PnP SPO inventory script ?
Thanks in Advance….
Hi Andre,
Both “FileType = $ListItem.FieldValues.File_x0020_Type” and “FileSize = $ListItem.FieldValues.File_x0020_Size” should work, when you have “File_x0020_Type” and “File_x0020_Size” fields included in the -Fields parameter of Get-PnPListItem cmdlet.
Hi Salaudeen,
Thanks for the tip, works a treat.
In case it helps any other newbies like me, to update the PnP SPO script to include file size,
update the $ListItems line to this:
$ListItems = Get-PnPListItem -List $_ -PageSize $Pagesize -Fields Author, Created, File_x0020_Type, File_x0020_Size -ScriptBlock `
and add the line:
FileSize = $ListItem.FieldValues.File_x0020_Size
in the ” #Collect document data” section.
Made my day.
Thank you : )
Hi, the output is great when running the first script against a single site collection, but how do i build in this additional section above so that I can create an inventory for all sites in the tenant and export to CSV?
Thanks for the great script. I have been trying to get this to work from the last 2days and i keeping getting this error
Get-DocumentInventory : The term ‘Get-DocumentInventory’ is not recognized as the name of a cmdlet, function, script file, or operable program. Check the spelling of the
name, or if a path was included, verify that the path is correct and try again.
Can you please help him providing a script which can loop through all the site collections and get the list of all libraries from the SPO tenant
I am getting the same issue, only getting 1 document library in the root
Hi, I ran the first script you have mentioned in this page in SPO. I was able to fetch all documents from the root site collection i.e. https://tenant.sharepoint.com. It stopped running after completing the list of documents. However, I am looking for the report which pulls the documents from each and every site collection within this tenant. For example: I want documents from https://tenant.sharepoint.com/sites/….. and https://tenant.sharepoint.com/teams/…. In each /sites and /teams I have thousands of sites and subsites and hence script only can help me in this regard. Could you please help me if I need to modify something in the above script.
I am in the same spot as you are – Is there any hope of a solution for this?
Sure, You can loop through each site collection in the tenant and call the function for each site. Use either -Append switch on Export-CSV cmdlet to append on the same CSV or you can use a new CSV for each site.
Hi,
I am pretty new to PowerShell so excuse the noob question. Do I need to save the script to a PS file first and then run it? Just not sure how to get this all running?
Thanks,
M
Please refer: How to Run PowerShell Scripts for SharePoint Online?
Hi, is it possbile to filter on a specific content type? I have tried to do this with this line
#Define CAML Query to Get List Items in batches
$Query = New-Object Microsoft.SharePoint.Client.CamlQuery
$Query.ViewXml =”
LI Case Document Set
$BatchSize
”
But I get an error
Error Generating Document Inventory! Exception calling “ExecuteQuery” with “0” argument(s): “Cannot complete this action
.
Please try again.”
“Cannot complete this action” error hints the given CAML query is not valid!
Hi great script, thanks. I am running this on a site collection where the list threshold is exceed and I get this error, is there anyway to get around it?
Error Generating Document Inventory! Exception calling “ExecuteQuery” with “0” argument(s): “The attempted operation is prohibited because it exceeds the list view threshold enforced by the administrator.”
Hi Mike,
Script is updated now to handle large lists and avoid list view threshold limit! You can use PnP PowerShell as well.
This comment has been removed by the author.
Hi, I tried the “get all files from all document libraries of a given site collection” script, it works great, but when I add “FileTitle = $ListItem.FieldValues.Title” along with name, url, etc on the list to get the the title of all the documents in my spreadsheet, it doesn’t work, the column is created with no value in it, an empty column. Do you know how to fix it? Thanks a lot in advance.
On document libraries, “Title” field is optional. So, There could be files without title value entered (But “Name” field is mandatory!).
Hi all,
Im trying to run the “SharePoint Online PowerShell to Get All Documents” but I get an error in the xml query. the code I used is the same displayed above:
$Query = New-Object Microsoft.SharePoint.Client.CamlQuery
$Query.ViewXml = “0”
$ListItems=$List.GetItems($Query)
The error I get:
Cannot convert argument “query”, with value: “Microsoft.SharePoint.Client.CamlQuery”, for “GetItems” to type “Microsoft.SharePoint.Client.CamlQuery”:
“Cannot convert the “Microsoft.SharePoint.Client.CamlQuery” value of type “Microsoft.SharePoint.Client.CamlQuery” to type
“Microsoft.SharePoint.Client.CamlQuery”.”
At line:3 char:1
+ $ListItems=$List.GetItems($Query)
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (:) [], MethodException
+ FullyQualifiedErrorId : MethodArgumentConversionInvalidCastArgument
Any ideas?
Thanks in advanced,
Why $Query.ViewXml = “0”? Shouldn’t it be the Filter to get files?
Sorry, wrong copy/paste, the code is:
…
$Query = New-Object Microsoft.SharePoint.Client.CamlQuery
$Query.ViewXml = “0”
$ListItems=$List.GetItems($Query)
…
Thanks!
Hi, did you find a solution to this? I am getting the same error
Oddly, Restarting PowerShell ISE works most of the times!
Any way you can include the last entry for version history?
Last entry for version history? You can generate version history report for documents using: Version History report for SharePoint Online